4
I Use This!
Inactive
Analyzed 1 day ago. based on code collected 1 day ago.

Project Summary

Crawler4j is an open source Java Crawler which provides a simple interface for crawling the web. Using it, you can setup a multi-threaded web crawler in 5 minutes!

Sample UsageFirst, you need to create a crawler class that extends WebCrawler. This class decides which URLs should be crawled and handles the downloaded page. The following is a sample implementation:

import java.util.ArrayList;
import java.util.regex.Pattern;

import edu.uci.ics.crawler4j.crawler.Page;
import edu.uci.ics.crawler4j.crawler.WebCrawler;
import edu.uci.ics.crawler4j.url.WebURL;

public class MyCrawler extends WebCrawler {

Pattern filters = Pattern.compile(".*(\\.(css|js|bmp|gif|jpe?g"
+ "|png|tiff?|mid|mp2|mp3|mp4"
+ "|wav|avi|mov|mpeg|ram|m4v|pdf"
+ "|rm|smil|wmv|swf|wma|zip|rar|gz))$");

public My

Tags

crawler java multi-threaded opensource web webcrawler

In a Nutshell, crawler4j...

Apache License 2.0
Permitted

Commercial Use

Modify

Distribute

Place Warranty

Sub-License

Private Use

Use Patent Claims

Forbidden

Hold Liable

Use Trademarks

Required

Include Copyright

State Changes

Include License

Include Notice

These details are provided for information only. No information here is legal advice and should not be used as such.

Project Security

Vulnerabilities per Version ( last 10 releases )

There are no reported vulnerabilities

Project Vulnerability Report

Security Confidence Index

Poor security track-record
Favorable security track-record

Vulnerability Exposure Index

Many reported vulnerabilities
Few reported vulnerabilities

Did You Know...

  • ...
    in 2016, 47% of companies did not have formal process in place to track OS code
  • ...
    anyone with an Open Hub account can update a project's tags
  • ...
    65% of companies leverage OSS to speed application development in 2016
  • ...
    check out hot projects on the Open Hub
About Project Security

Languages

Java
73%
XML
11%
Groovy
12%
5 Other
4%

30 Day Summary

Mar 18 2024 — Apr 17 2024

12 Month Summary

Apr 17 2023 — Apr 17 2024