Crawler4j is an open source Java Crawler which provides a simple interface for crawling the web. Using it, you can setup a multi-threaded web crawler in 5 minutes!
Sample UsageFirst, you need to create a crawler class that extends WebCrawler. This class decides which URLs should be crawled and
... [More] handles the downloaded page. The following is a sample implementation:
import java.util.ArrayList;
import java.util.regex.Pattern;
import edu.uci.ics.crawler4j.crawler.Page;
import edu.uci.ics.crawler4j.crawler.WebCrawler;
import edu.uci.ics.crawler4j.url.WebURL;
public class MyCrawler extends WebCrawler {
Pattern filters = Pattern.compile(".*(\\.(css|js|bmp|gif|jpe?g"
+ "|png|tiff?|mid|mp2|mp3|mp4"
+ "|wav|avi|mov|mpeg|ram|m4v|pdf"
+ "|rm|smil|wmv|swf|wma|zip|rar|gz))$");
public My [Less]
Initially intended to be entirely based around 3D graphics in Scala, Sgine has expanded to provide additional functionality for high performance interactive applications.
JMrTools, is inspired by the MrTools Perl Project (Written by Robert Marino). This JAVA project implements ssh telnet, scp, rsync, sftp, ftp, and stdio. Providing execution/transfer of commands, script, or files on Multiple hosts concurrently.
This site uses cookies to give you the best possible experience.
By using the site, you consent to our use of cookies.
For more information, please see our
Privacy Policy