Tags : Browse Projects

Select a tag to browse associated projects and drill deeper into the tag cloud.

Équipe Crawler

Compare

  Analyzed about 6 hours ago

A specialized crawler for the French sport newspaper L'Équipe. Starting from the front page or from a given list of links, the crawler retrieves newspaper articles and gathers new links to explore as it goes, stripping the text of each article out of the HTML formatting and saving it into a raw ... [More] text file. The project includes scripts to convert it into the XML format for further use with natural language processing tools. [Less]

401 lines of code

0 current contributors

over 11 years since last commit

0 users on Open Hub

Inactive
0.0
 
I Use This

German Political Speeches Corpus-Builder

Compare

  Analyzed about 6 hours ago

Tools to crawl German official speeches repositories in order to gather a corpus. More information to come. A complete version of the corpus including a visualization tool is available here : http://purl.org/corpus/german-speeches

1.08K lines of code

0 current contributors

over 10 years since last commit

0 users on Open Hub

Inactive
0.0
 
I Use This

Spelt

Compare

Claimed by Translate Analyzed about 6 hours ago

Spelt is a simple graphical program that can be used to classify words in a language. It is particularly designed to identify word roots and to classify them according to part-of-speech. The initial development of this program was specifically meant to simplify work on spell checkers, but you might find it useful for many other purposes.

3.83K lines of code

0 current contributors

about 12 years since last commit

0 users on Open Hub

Inactive
0.0
 
I Use This

purepos

Compare

  Analyzed about 12 hours ago

PurePos morphological disambiguator.

6.84K lines of code

1 current contributors

almost 4 years since last commit

0 users on Open Hub

Inactive
0.0
 
I Use This

CorpusCatcher

Compare

Claimed by Translate Analyzed about 20 hours ago

CorpusCatcher is a corpus collection toolset. It can help you to build language or topic specific corpora from publicly available web resources. This can be very useful for many purposes, especially for data to build spell checkers.

813 lines of code

0 current contributors

about 12 years since last commit

0 users on Open Hub

Inactive
0.0
 
I Use This

Perseus Digital Library

Compare

  Analyzed about 19 hours ago

This will be the base repo for all text and annotation data published in the PDL

0 lines of code

0 current contributors

almost 9 years since last commit

0 users on Open Hub

Activity Not Available
0.0
 
I Use This
Mostly written in language not available
Licenses: No declared licenses

Swinburne

Compare

  Analyzed 1 day ago

Poems and Ballads (1866) -- a skeuomorphic edition

26K lines of code

0 current contributors

over 8 years since last commit

0 users on Open Hub

Inactive
0.0
 
I Use This