Zeitcrawler

I Use This!

Inactive

Analyzed 1 day ago. based on code collected 2 days ago.

Project Summary

A specialized crawler for the German newspaper 'Die Zeit'.

Starting from the front page or from a given list of links, the crawler retrieves newspaper articles and gathers new links to explore as it goes, stripping the text of each article out of the HTML formatting and saving it into a raw text file.

The project includes scripts to convert it into the XML format for further use with natural language processing tools.

In a Nutshell, Zeitcrawler...

...
has had 29 commits made by 2 contributors
representing 1,649 lines of code
...
is mostly written in Perl
with an average number of source code comments
...
has a young, but established codebase
maintained by nobody
with stable Y-O-Y commits
...
took an estimated 1 years of effort (COCOMO model)
starting with its first commit in July, 2012
ending with its most recent commit over 12 years ago

Quick Reference

Project Links:

Homepage
Download

Code Locations:

https://github.com/adbar/zeitcra...

Similar Projects:

Managers:

Adrien Barbaresi

Licenses

GNU General Public License v3.0 only

Permitted

Commercial Use

Modify

Distribute

Place Warranty

Use Patent Claims

Forbidden

Sub-License

Hold Liable

Required

Distribute Original

Disclose Source

Include Copyright

State Changes

Include License

Include Install Instructions

These details are provided for information only. No information here is legal advice and should not be used as such.

All Licenses

This Project has No vulnerabilities Reported Against it

Did You Know...

...
nearly 1 in 3 companies have no process for identifying, tracking, or remediating known open source vulnerabilities
...
you can embed statistics from Open Hub on your site
...
use of OSS increased in 65% of companies in 2016
...
compare projects before you chose one to use