4
I Use This!
Inactive
Analyzed about 19 hours ago. based on code collected about 20 hours ago.

Project Summary

Crawler4j is an open source Java Crawler which provides a simple interface for crawling the web. Using it, you can setup a multi-threaded web crawler in 5 minutes!

Sample UsageFirst, you need to create a crawler class that extends WebCrawler. This class decides which URLs should be crawled and handles the downloaded page. The following is a sample implementation:

import java.util.ArrayList;
import java.util.regex.Pattern;

import edu.uci.ics.crawler4j.crawler.Page;
import edu.uci.ics.crawler4j.crawler.WebCrawler;
import edu.uci.ics.crawler4j.url.WebURL;

public class MyCrawler extends WebCrawler {

Pattern filters = Pattern.compile(".*(\\.(css|js|bmp|gif|jpe?g"
+ "|png|tiff?|mid|mp2|mp3|mp4"
+ "|wav|avi|mov|mpeg|ram|m4v|pdf"
+ "|rm|smil|wmv|swf|wma|zip|rar|gz))$");

public My

Tags

crawler java multi-threaded opensource web webcrawler

In a Nutshell, crawler4j...

Apache License 2.0
Permitted

Commercial Use

Modify

Distribute

Place Warranty

Sub-License

Private Use

Use Patent Claims

Forbidden

Hold Liable

Use Trademarks

Required

Include Copyright

State Changes

Include License

Include Notice

These details are provided for information only. No information here is legal advice and should not be used as such.

Project Security

Vulnerabilities per Version ( last 10 releases )

There are no reported vulnerabilities

Project Vulnerability Report

Security Confidence Index

Poor security track-record
Favorable security track-record

Vulnerability Exposure Index

Many reported vulnerabilities
Few reported vulnerabilities

Did You Know...

  • ...
    there are over 3,000 projects on the Open Hub with security vulnerabilities reported against them
  • ...
    you can embed statistics from Open Hub on your site
  • ...
    use of OSS increased in 65% of companies in 2016
  • ...
    check out hot projects on the Open Hub
About Project Security

Languages

Java
73%
XML
11%
Groovy
12%
5 Other
4%

30 Day Summary

May 12 2025 — Jun 11 2025

12 Month Summary

Jun 11 2024 — Jun 11 2025