The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but it is probably one of the most accurate open source OCR engines available. The source code will read a binary, grey or color image and output text, ALTO, hOCR or PDF. Tesseract can read most common image formats.
Since 2020 the Internet Archive uses Tesseract to get text for its scanned documents.
Use Patent Claims
These details are provided for information only. No information here is legal advice and should not be used as such.