Scrapy is a fast high-level scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.
Pythonic Crawling / Scraping Framework Built on Eventlet
Features
* High Speed WebCrawler built on Eventlet.
* Supports databases engines like Postgre, Mysql, Oracle, Sqlite.
* Command line tools.
* Extract data using your favourite tool. XPath or Pyquery (A Jquery-like library for python).
... [More]
* Cookie Handlers.
* Very easy to use (see the example).
Documentation
http://packages.python.org/crawley/ [Less]
QuickCode is the new name for the original ScraperWiki product. We renamed it, as it isn’t a wiki or just for scraping any more.
It’s a Python and R data analysis environment, ideal for economists, statisticians and data managers who are new to coding.
robots.txt exclusion protocol implementation for Go language (golang).
Contains logically separate parser and checker.
Parser may be used to check robots.txt file for correctness.
Checker may be used for may-i-visit-this-url kind of queries against parsed robots.txt data.
Xidel is a command line tool to download web pages and extract data from them.
It can download files over http/s connections, follow redirections, links, or extracted values, and also process local files.
The data can be extracted using XPath 2.0, XQuery 1.0 expressions, JSONiq, CSS 3 selectors
... [More], and custom, pattern-matching templates that are like an annotated version of the processed page.
The extracted values can then be exported as plain text/xml/html/json or assigned to variables to be used in other extract expressions or to be exported to the shell.
There is also an online cgi service for testing. [Less]
This site uses cookies to give you the best possible experience.
By using the site, you consent to our use of cookies.
For more information, please see our
Privacy Policy