Generator of extremely fast lexical analysers. Sophisticated input/buffer management. Many character encodings (incl. ASCII, UTF8, UTF16, RUSCII, ...) are directly supported. Regular expressions are specified in the lex/flex style.
* Support for Unicode and many other character
... [More] encodings.
* Modes with inheritance relationships and transition rules.
* Sophisticated buffer management.
* Include stacks.
* Customized token classes.
* Template compression for code size reduction.
* Path compression for code size reduction.
* Possibility of indentation based lexical analysis (INDENT, DEDENT, NODENT).
* Produces direct coded lexical analyzers.
* Adjustable implicit line and column number counting. [Less]
CORSIS (formerly Tenka Text) is a performance‐oriented, open‐source library for corpus analysis. It utilizes typed assembly, task‐specific compilers and parallelization to deliver the best performance with elegant design. Demonstrative GUI of the project comes with Wordlister - an advanced
... [More], extremely fast graphical wordlist tool and a regex concordance tool. CORSIS - the open-source answer to WordSmith Tools. [Less]
/ Phase One : Waiting for 5+ Members to Join /
AIMS: Establish an encoding standard that breaks down sentences into language-neutral atomic elements of meaning.
PURPOSE: Aid Machine Interpretation and Translation of Texts.