Analyzed about 21 hours ago
The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but it is probably one of the most accurate open source OCR engines available. The source code will read a binary, grey or color image and output text, ALTO
... [More]
, hOCR or PDF. Tesseract can read most common image formats.
Since 2020 the Internet Archive uses Tesseract to get text for its scanned documents. [Less]
4.05M
lines of code
49
current contributors
10 days
since last commit
17
users on Open Hub