Projects tagged ‘corpus_linguistics’

Text Encoding Initiative

Analyzed 1 day ago

The TEI is an international and interdisciplinary community-based open standard used by research project, libraries, museums, publishers, and academics to represent all kinds of literary and linguistic texts, using an encoding scheme that is maximally expressive and minimally obsolescent.

553K lines of code

14 current contributors

12 days since last commit

3 users on Open Hub

Moderate Activity

0 Reviews

I Use This

Mostly written in XSL Transformation

Licenses: BSD-2-Clause, cc-by-3

RelEx Semantic Relationship Extractor

R

Analyzed about 6 hours ago

RelEx is an English-language semantic relationship extractor, built on the Carnegie-Mellon Link Grammar parser. It can identify dependency-grammar dependencies, such as subject, object, indirect object and many other relationships between words in a sentence. It can also provide part-of-speech ... [More]

11.8K lines of code

4 current contributors

4 months since last commit

2 users on Open Hub

Very Low Activity

0 Reviews

I Use This

Mostly written in Java

Licenses: apache_2

Tags ai anaphora artificial_intelligence computational_linguistics corpus_linguistics dependency dependency_grammar grammar hobbs java linguistics natural_language 8 more...

LexAt Lexical/Corpus Statistics

L

No analysis available

The LexAt "lexical attraction" aka the RelEx Statistical Linguistics package adds statistical algorithms to the RelEx. Corpus statistics, including mutual information, are maintained in an SQL database, and drawn on to enhance various RelEx functions, such as parse ranking and chunk ranking, and word-sense disambiguation (Mihalcea algo).

0 lines of code

0 current contributors

0 since last commit

1 users on Open Hub

Activity Not Available

0 Reviews

I Use This

Mostly written in language not available

Licenses: apache_2

Tags computational_linguistics corpora corpus corpus_linguistics database java linguistics natural_language natural_language_processing nlp opencog perl 1 more...

opencorpora

O

Analyzed about 13 hours ago

An engine for creating and annotating textual corpora

38.6K lines of code

3 current contributors

8 months since last commit

1 users on Open Hub

Very Low Activity

0 Reviews

I Use This

Mostly written in PHP

Licenses: gpl

Tags computational_linguistics corpora corpus corpus_linguistics crowdsourcing disambiguation linguistics natural-language-processing natural_language_processing nlp part_of_speech russian 1 more...

porter-stem.vim

P

Analyzed about 17 hours ago

Implementation of Porter stemming algorithm in vim script. See https://www.ohloh.net/p/stem-search-vim for a script that makes use of this.

205 lines of code

0 current contributors

over 7 years since last commit

0 users on Open Hub

Inactive

0 Reviews

I Use This

Mostly written in Vim Script

Licenses: mit

Tags corpus_linguistics linguistics stem vim

stem-search.vim

S

Analyzed about 7 hours ago

StmSrch is a reverse-stem searching script. It implements the Porter stemming algorithm, by Martin Porter. It also handles irregular verbs and noun pluralizations. This script can be useful for searching or scanning through corpus files. Each word input to the :StmSrch command will be stemmed ... [More]

308 lines of code

0 current contributors

almost 14 years since last commit

0 users on Open Hub

Inactive

0 Reviews

I Use This

Mostly written in Vim Script

Licenses: mit

Tags corpus corpus_linguistics nlp porter search stem vim

He Kupu Tawhito

H

Analyzed about 7 hours ago

979 lines of code

1 current contributors

almost 5 years since last commit

0 users on Open Hub

Inactive

0 Reviews

I Use This

Mostly written in shell script

Licenses: No declared licenses

Tags corpus_linguistics exist-db humanities multilingual tei textanalysis textencodinginitiative xml xquery xslt xslt20 xsl-transformation

Zeitcrawler

Z

Analyzed about 19 hours ago

A specialized crawler for the German newspaper 'Die Zeit'. Starting from the front page or from a given list of links, the crawler retrieves newspaper articles and gathers new links to explore as it goes, stripping the text of each article out of the HTML formatting and saving it into a raw text ... [More]

1.64K lines of code

0 current contributors

about 10 years since last commit

0 users on Open Hub

Inactive

0 Reviews

I Use This

Mostly written in Perl

Licenses: gpl3

Tags academic computational_linguistics corpus corpus_linguistics crawler digital_humanities natural_language_processing nlp perl unix webcrawler xml

Équipe Crawler

É

Analyzed about 7 hours ago

A specialized crawler for the French sport newspaper L'Équipe. Starting from the front page or from a given list of links, the crawler retrieves newspaper articles and gathers new links to explore as it goes, stripping the text of each article out of the HTML formatting and saving it into a raw ... [More]

401 lines of code

0 current contributors

over 11 years since last commit

0 users on Open Hub

Inactive

0 Reviews

I Use This

Mostly written in Perl

Licenses: gpl3

Tags academic computational_linguistics corpus corpus_linguistics crawler digital_humanities natural_language_processing nlp perl unix webcrawler xml

German Political Speeches Corpus-Builder

G

Analyzed about 7 hours ago

Tools to crawl German official speeches repositories in order to gather a corpus. More information to come. A complete version of the corpus including a visualization tool is available here : http://purl.org/corpus/german-speeches

1.08K lines of code

0 current contributors

over 10 years since last commit

0 users on Open Hub

Inactive

0 Reviews

I Use This

Mostly written in Perl

Licenses: gpl3

Tags academic computational_linguistics corpus corpus_linguistics crawler digital_humanities natural_language_processing nlp perl unix webcrawler xml

Tags : Browse Projects