Tags : Browse Projects

Select a tag to browse associated projects and drill deeper into the tag cloud.

Scrapy

Compare

  Analyzed about 13 hours ago

Scrapy is a fast high-level scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

49.2K lines of code

50 current contributors

15 days since last commit

20 users on Open Hub

High Activity
5.0
 
I Use This

Crawley

Compare

  Analyzed about 17 hours ago

Pythonic Crawling / Scraping Framework Built on Eventlet Features * High Speed WebCrawler built on Eventlet. * Supports databases engines like Postgre, Mysql, Oracle, Sqlite. * Command line tools. * Extract data using your favourite tool. XPath or Pyquery (A Jquery-like library for python). ... [More] * Cookie Handlers. * Very easy to use (see the example). Documentation http://packages.python.org/crawley/ [Less]

3.69K lines of code

0 current contributors

almost 9 years since last commit

1 users on Open Hub

Inactive
0.0
 
I Use This

QuickCode (formerly ScraperWiki)

Compare

  No analysis available

QuickCode is the new name for the original ScraperWiki product. We renamed it, as it isn’t a wiki or just for scraping any more. It’s a Python and R data analysis environment, ideal for economists, statisticians and data managers who are new to coding.

0 lines of code

1 current contributors

0 since last commit

1 users on Open Hub

Activity Not Available
0.0
 
I Use This
Mostly written in language not available
Licenses: No declared licenses

robots.txt-go

Compare

  Analyzed about 10 hours ago

robots.txt exclusion protocol implementation for Go language (golang). Contains logically separate parser and checker. Parser may be used to check robots.txt file for correctness. Checker may be used for may-i-visit-this-url kind of queries against parsed robots.txt data.

1.11K lines of code

2 current contributors

over 1 year since last commit

0 users on Open Hub

Very Low Activity
0.0
 
I Use This

Xidel

Compare

  No analysis available

Xidel is a command line tool to download web pages and extract data from them. It can download files over http/s connections, follow redirections, links, or extracted values, and also process local files. The data can be extracted using XPath 2.0, XQuery 1.0 expressions, JSONiq, CSS 3 selectors ... [More] , and custom, pattern-matching templates that are like an annotated version of the processed page. The extracted values can then be exported as plain text/xml/html/json or assigned to variables to be used in other extract expressions or to be exported to the shell. There is also an online cgi service for testing. [Less]

0 lines of code

1 current contributors

0 since last commit

0 users on Open Hub

Activity Not Available
0.0
 
I Use This
Mostly written in language not available
Licenses: gpl