4
I Use This!
Inactive
Analyzed about 6 hours ago. based on code collected about 11 hours ago.

Project Summary

Crawler4j is an open source Java Crawler which provides a simple interface for crawling the web. Using it, you can setup a multi-threaded web crawler in 5 minutes!

Sample UsageFirst, you need to create a crawler class that extends WebCrawler. This class decides which URLs should be crawled and handles the downloaded page. The following is a sample implementation:

import java.util.ArrayList;
import java.util.regex.Pattern;

import edu.uci.ics.crawler4j.crawler.Page;
import edu.uci.ics.crawler4j.crawler.WebCrawler;
import edu.uci.ics.crawler4j.url.WebURL;

public class MyCrawler extends WebCrawler {

Pattern filters = Pattern.compile(".*(\\.(css|js|bmp|gif|jpe?g"
+ "|png|tiff?|mid|mp2|mp3|mp4"
+ "|wav|avi|mov|mpeg|ram|m4v|pdf"
+ "|rm|smil|wmv|swf|wma|zip|rar|gz))$");

public My

Tags

crawler java multi-threaded opensource web webcrawler

In a Nutshell, crawler4j...

Apache License 2.0
Permitted

Commercial Use

Modify

Distribute

Place Warranty

Sub-License

Private Use

Use Patent Claims

Forbidden

Hold Liable

Use Trademarks

Required

Include Copyright

State Changes

Include License

Include Notice

These details are provided for information only. No information here is legal advice and should not be used as such.

Project Security

Vulnerabilities per Version ( last 10 releases )

There are no reported vulnerabilities

Project Vulnerability Report

Security Confidence Index

Poor security track-record
Favorable security track-record

Vulnerability Exposure Index

Many reported vulnerabilities
Few reported vulnerabilities

Did You Know...

  • ...
    nearly 1 in 3 companies have no process for identifying, tracking, or remediating known open source vulnerabilities
  • ...
    search using multiple tags to find exactly what you need
  • ...
    there are over 3,000 projects on the Open Hub with security vulnerabilities reported against them
  • ...
    learn about Open Hub updates and features on the Open Hub blog
About Project Security

Languages

Java
73%
XML
11%
Groovy
12%
5 Other
4%

30 Day Summary

Mar 20 2024 — Apr 19 2024

12 Month Summary

Apr 19 2023 — Apr 19 2024