Tags : Browse Projects

Select a tag to browse associated projects and drill deeper into the tag cloud.

html-snapshots

Compare

  Analyzed 3 days ago

A selector-based html snapshot tool using PhantomJS that sources sitemap.xml, robots.txt, or arbitrary input

6.34K lines of code

1 current contributors

7 days since last commit

1 users on Open Hub

High Activity
0.0
 
I Use This

polipus

Compare

  Analyzed about 23 hours ago

Polipus: distributed and scalable web-crawler framework

2.45K lines of code

0 current contributors

over 9 years since last commit

1 users on Open Hub

Inactive
0.0
 
I Use This
Licenses: No declared licenses
Tags crawler

Corpusexplorer.SDK.Extern

Compare

  Analyzed about 8 hours ago

Dieses Projekt ist Teil des Software Development Kit - des CorpusExplorers (CorpusExplorer SDK) [Weitere Informationen finden Sie hier]. Das SDK sowie alle Teile können kostenlos für Forschungs- und Bildungsprojekte genutzt werden. Dieser Teil des SDK steht unter der GPL-3.0-Lizenz. Sie können ... [More] dieses Projekt nutzen um: - Den CorpusExplorer zu erweitern. - Ihr eigenes Programm mit dem CorpusExplorer zu verbinden (API-Schnittstelle). - Oder unabhängig vom CorpusExplorer ihr eigenes Programm zu entwickeln/erweitern und so auf bewährte Lösungen zurückzugreifen. [Less]

33.8K lines of code

0 current contributors

over 7 years since last commit

1 users on Open Hub

Inactive
5.0
 
I Use This

Wikipedia-API

Compare

  Analyzed about 11 hours ago

Python wrapper for Wikipedia

2.15K lines of code

2 current contributors

4 days since last commit

1 users on Open Hub

Moderate Activity
0.0
 
I Use This

Spatie Crawler

Compare

  Analyzed about 24 hours ago

An easy to use, powerful crawler implemented in PHP. Can execute JavaScript.

2.25K lines of code

0 current contributors

4 months since last commit

1 users on Open Hub

Very Low Activity
0.0
 
I Use This
Licenses: No declared licenses

psilib

Compare

  No analysis available

Python library allowing applications to process Portable Site Information (PSI). PSI is an XML-standard for enabling entire websites to be exchanged between content management tool without feature loss.

0 lines of code

0 current contributors

0 since last commit

0 users on Open Hub

Activity Not Available
0.0
 
I Use This
Mostly written in language not available
Licenses: BSD-3-Clause

Maryam-project

Compare

  Analyzed 3 days ago

Maryam is a full-featured Web Reconnaissance framework written in Python. Complete with independent modules, built in convenience functions, interactive help, and command completion, Maryam provides a powerful environment in which open source web-based reconnaissance can be conducted quickly and ... [More] thoroughly. Maryam is a completely modular framework and makes it easy for even the newest of Python developers to contribute. Each module is a subclass of the "module" class.The "module" class is a customized "cmd" interpreter equipped with built-in functionality that provides simple interfaces to common tasks such as standardizing output, and making web requests. Therefore, all the hard work has been done. Building modules is simple and takes little more than a few minutes. [Less]

9.05K lines of code

3 current contributors

2 months since last commit

0 users on Open Hub

Very Low Activity
0.0
 
I Use This

Zeitcrawler

Compare

  Analyzed 3 days ago

A specialized crawler for the German newspaper 'Die Zeit'. Starting from the front page or from a given list of links, the crawler retrieves newspaper articles and gathers new links to explore as it goes, stripping the text of each article out of the HTML formatting and saving it into a raw text ... [More] file. The project includes scripts to convert it into the XML format for further use with natural language processing tools. [Less]

1.64K lines of code

0 current contributors

over 10 years since last commit

0 users on Open Hub

Inactive
0.0
 
I Use This

Équipe Crawler

Compare

  Analyzed 1 day ago

A specialized crawler for the French sport newspaper L'Équipe. Starting from the front page or from a given list of links, the crawler retrieves newspaper articles and gathers new links to explore as it goes, stripping the text of each article out of the HTML formatting and saving it into a raw ... [More] text file. The project includes scripts to convert it into the XML format for further use with natural language processing tools. [Less]

401 lines of code

0 current contributors

about 12 years since last commit

0 users on Open Hub

Inactive
0.0
 
I Use This

German Political Speeches Corpus-Builder

Compare

  Analyzed 2 days ago

Tools to crawl German official speeches repositories in order to gather a corpus. More information to come. A complete version of the corpus including a visualization tool is available here : http://purl.org/corpus/german-speeches

1.08K lines of code

0 current contributors

about 11 years since last commit

0 users on Open Hub

Inactive
0.0
 
I Use This