Projects tagged ‘crawler’

Dieses Projekt ist Teil des Software Development Kit - des CorpusExplorers (CorpusExplorer SDK) [Weitere Informationen finden Sie hier]. Das SDK sowie alle Teile können kostenlos für Forschungs- und Bildungsprojekte genutzt werden. Dieser Teil des SDK steht unter der GPL-3.0-Lizenz. Sie können ... [More]

33.8K lines of code

0 current contributors

about 8 years since last commit

1 users on Open Hub

Inactive

0 Reviews

I Use This

Mostly written in C#

Licenses: GPL2

Tags crawler importer linguistic linguistics nlp scraper webscraper xmlparser

Wikipedia-API

W

Analyzed 1 day ago

Python wrapper for Wikipedia

2.43K lines of code

2 current contributors

2 days since last commit

1 users on Open Hub

Moderate Activity

0 Reviews

I Use This

Mostly written in Python

Licenses: mit

Tags api bot crawler mediawiki mediawikiapi python wiki wikipedia wikipedia-mining

Spatie Crawler

S

Analyzed about 1 hour ago

An easy to use, powerful crawler implemented in PHP. Can execute JavaScript.

2.36K lines of code

0 current contributors

about 20 hours since last commit

1 users on Open Hub

Low Activity

0 Reviews

I Use This

Mostly written in PHP

Licenses: No declared licenses

Tags crawler GoogleChrome Guzzle headless javascript php puppeteer spatie

psilib

No analysis available

Python library allowing applications to process Portable Site Information (PSI). PSI is an XML-standard for enabling entire websites to be exchanged between content management tool without feature loss.

0 lines of code

0 current contributors

0 since last commit

0 users on Open Hub

Activity Not Available

0 Reviews

I Use This

Mostly written in language not available

Licenses: BSD-3-Clause

Tags contentmanagementsystem crawler library portable python site web

Maryam-project

Analyzed about 22 hours ago

Maryam is a full-featured Web Reconnaissance framework written in Python. Complete with independent modules, built in convenience functions, interactive help, and command completion, Maryam provides a powerful environment in which open source web-based reconnaissance can be conducted quickly and ... [More]

9.05K lines of code

3 current contributors

8 months since last commit

0 users on Open Hub

Very Low Activity

0 Reviews

I Use This

Mostly written in Python

Licenses: lgpv3_or_...

Tags #core crawler #cui framework identify Maryam #maryam-framework #maryam-project #owasp #owasp-maryam-project python scan 8 more...

Zeitcrawler

Z

Analyzed 1 day ago

A specialized crawler for the German newspaper 'Die Zeit'. Starting from the front page or from a given list of links, the crawler retrieves newspaper articles and gathers new links to explore as it goes, stripping the text of each article out of the HTML formatting and saving it into a raw text ... [More]

1.64K lines of code

0 current contributors

about 11 years since last commit

0 users on Open Hub

Inactive

0 Reviews

I Use This

Mostly written in Perl

Licenses: gpl3

Tags academic computational_linguistics corpus corpus_linguistics crawler digital_humanities natural_language_processing nlp perl unix webcrawler xml

Équipe Crawler

É

Analyzed about 5 hours ago

A specialized crawler for the French sport newspaper L'Équipe. Starting from the front page or from a given list of links, the crawler retrieves newspaper articles and gathers new links to explore as it goes, stripping the text of each article out of the HTML formatting and saving it into a raw ... [More]

401 lines of code

0 current contributors

over 12 years since last commit

0 users on Open Hub

Inactive

0 Reviews

I Use This

Mostly written in Perl

Licenses: gpl3

Tags academic computational_linguistics corpus corpus_linguistics crawler digital_humanities natural_language_processing nlp perl unix webcrawler xml

German Political Speeches Corpus-Builder

G

Analyzed about 6 hours ago

Tools to crawl German official speeches repositories in order to gather a corpus. More information to come. A complete version of the corpus including a visualization tool is available here : http://purl.org/corpus/german-speeches

1.08K lines of code

0 current contributors

over 11 years since last commit

0 users on Open Hub

Inactive

0 Reviews

I Use This

Mostly written in Perl

Licenses: gpl3

Tags academic computational_linguistics corpus corpus_linguistics crawler digital_humanities natural_language_processing nlp perl unix webcrawler xml

Tags : Browse Projects

html-snapshots

polipus

Corpusexplorer.SDK.Extern

Wikipedia-API

Spatie Crawler

psilib

Maryam-project

Zeitcrawler

Équipe Crawler

German Political Speeches Corpus-Builder