The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but it is probably one of the most accurate open source OCR engines available.
Tesseract will read a binary, grey or color image and output text, ALTO, PAGE XML, hOCR or PDF. It can read most common image formats.
Since 2020 the Internet Archive uses Tesseract to get text for its scanned documents.
Commercial Use
Modify
Distribute
Place Warranty
Sub-License
Private Use
Use Patent Claims
Hold Liable
Use Trademarks
Include Copyright
State Changes
Include License
Include Notice
These details are provided for information only. No information here is legal advice and should not be used as such.
There are no reported vulnerabilities
30 Day SummaryMar 22 2025 — Apr 21 2025
|
12 Month SummaryApr 21 2024 — Apr 21 2025
|