Projects tagged ‘tokenizer’

DKPro Core

Analyzed 1 day ago

DKPro Core is a collection of software components for natural language processing (NLP) based on the Apache UIMA framework. Many powerful and state-of-the-art NLP components are already freely available in the NLP research community. New and improved components are being developed and released ... [More]

149K lines of code

8 current contributors

10 months since last commit

6 users on Open Hub

Very Low Activity

0 Reviews

I Use This

Mostly written in Java

Licenses: apache_2, gpl3

Lexical Analyzer Generator Quex

Analyzed 1 day ago

Generator of extremely fast lexical analysers. Sophisticated input/buffer management. Many character encodings (incl. ASCII, UTF8, UTF16, RUSCII, ...) are directly supported. Regular expressions are specified in the lex/flex style. Features: * Support for Unicode and many other character ... [More]

76.2K lines of code

2 current contributors

about 5 years since last commit

2 users on Open Hub

Inactive

0 Reviews

I Use This

Mostly written in Python

Licenses: lgpl

Tags analyzer C C++ compiler lexical parser regular_expression rule tokenizer unicode word_processor

SourceEditor

S

Analyzed 1 day ago

Library to tokenize, edit and override PHP classes

1.01K lines of code

0 current contributors

about 11 years since last commit

1 users on Open Hub

Inactive

0 Reviews

I Use This

Mostly written in PHP

Licenses: No declared licenses

Tags tokenizer

pyTokenizer

Analyzed 1 day ago

A streaming tokenizer.

362 lines of code

1 current contributors

about 5 years since last commit

1 users on Open Hub

Inactive

0 Reviews

I Use This

Mostly written in Python

Licenses: apache_2

Tags parser python3 tokenizer

trie keys - values

T

Analyzed 1 day ago

creates a compressed trie that maps keys to values and values to keys. Compression is on the front end of keys. Useful for lightweight reserved word creation in constrained memory/processor power situations. Written in C.

1.18K lines of code

0 current contributors

over 15 years since last commit

0 users on Open Hub

Inactive

0 Reviews

I Use This

Mostly written in Java

Licenses: No declared licenses

Tags parser symbols tokenizer trie

tokkens-ruby

T

Analyzed about 20 hours ago

Basic text to numbers tokenizer for machine learning. Tokkens makes it easy to apply a vector space model to text documents, targeted towards with machine learning. It provides a mapping between numbers and tokens (strings).

474 lines of code

0 current contributors

over 8 years since last commit

0 users on Open Hub

Inactive

0 Reviews

I Use This

Mostly written in Ruby

Licenses: mit

Tags machinelearning ruby rubygem text_processing tokenizer vectorspacemodel

Tags : Browse Projects

DKPro Core

Lexical Analyzer Generator Quex

SourceEditor

pyTokenizer

trie keys - values

tokkens-ruby