4
I Use This!
Inactive
Analyzed about 17 hours ago. based on code collected 2 days ago.

Project Summary

Crawler4j is an open source Java Crawler which provides a simple interface for crawling the web. Using it, you can setup a multi-threaded web crawler in 5 minutes!

Sample UsageFirst, you need to create a crawler class that extends WebCrawler. This class decides which URLs should be crawled and handles the downloaded page. The following is a sample implementation:

import java.util.ArrayList;
import java.util.regex.Pattern;

import edu.uci.ics.crawler4j.crawler.Page;
import edu.uci.ics.crawler4j.crawler.WebCrawler;
import edu.uci.ics.crawler4j.url.WebURL;

public class MyCrawler extends WebCrawler {

Pattern filters = Pattern.compile(".*(\\.(css|js|bmp|gif|jpe?g"
+ "|png|tiff?|mid|mp2|mp3|mp4"
+ "|wav|avi|mov|mpeg|ram|m4v|pdf"
+ "|rm|smil|wmv|swf|wma|zip|rar|gz))$");

public My

Tags

crawler java multi-threaded opensource web webcrawler

In a Nutshell, crawler4j...

Apache License 2.0
Permitted

Commercial Use

Modify

Distribute

Place Warranty

Sub-License

Private Use

Use Patent Claims

Forbidden

Hold Liable

Use Trademarks

Required

Include Copyright

State Changes

Include License

Include Notice

These details are provided for information only. No information here is legal advice and should not be used as such.

Project Security

Vulnerabilities per Version ( last 10 releases )

There are no reported vulnerabilities

Project Vulnerability Report

Security Confidence Index

Poor security track-record
Favorable security track-record

Vulnerability Exposure Index

Many reported vulnerabilities
Few reported vulnerabilities

Did You Know...

  • ...
    in 2016, 47% of companies did not have formal process in place to track OS code
  • ...
    compare projects before you chose one to use
  • ...
    55% of companies leverage OSS for production infrastructure
  • ...
    anyone with an Open Hub account can update a project's tags
About Project Security

Languages

Java
73%
XML
11%
Groovy
12%
5 Other
4%

30 Day Summary

Mar 24 2025 — Apr 23 2025

12 Month Summary

Apr 23 2024 — Apr 23 2025