DKPro Core is a collection of software components for natural language processing (NLP) based on the Apache UIMA framework.
Many powerful and state-of-the-art NLP components are already freely available in the NLP research community. New and improved components are being developed and released
... [More] continuously. The components cover the whole range of NLP-related processing tasks. DKPro Core provides wrappers for such third-party tool as well as original NLP components. DKPro Core builds heavily on uimaFIT which allows for rapid and easy development of NLP processing pipelines. [Less]
Generator of extremely fast lexical analysers. Sophisticated input/buffer management. Many character encodings (incl. ASCII, UTF8, UTF16, RUSCII, ...) are directly supported. Regular expressions are specified in the lex/flex style.
Features:
* Support for Unicode and many other character
... [More] encodings.
* Modes with inheritance relationships and transition rules.
* Sophisticated buffer management.
* Include stacks.
* Customized token classes.
* Template compression for code size reduction.
* Path compression for code size reduction.
* Possibility of indentation based lexical analysis (INDENT, DEDENT, NODENT).
* Produces direct coded lexical analyzers.
* Adjustable implicit line and column number counting. [Less]
creates a compressed trie that maps keys to values and values to keys. Compression is on the front end of keys. Useful for lightweight reserved word creation in constrained memory/processor power situations. Written in C.
Basic text to numbers tokenizer for machine learning.
Tokkens makes it easy to apply a vector space model to text documents, targeted towards with machine learning. It provides a mapping between numbers and tokens (strings).
This site uses cookies to give you the best possible experience.
By using the site, you consent to our use of cookies.
For more information, please see our
Privacy Policy