H
Analyzed about 22 hours ago
hOCR is a format for representing OCR output, including layout information, character confidences, bounding boxes, and style information. It embeds this information invisibly in standard HTML. By building on standard HTML, it automatically inherits well-defined support for most scripts, languages
... [More]
, and common layout options. Furthermore, unlike previous OCR formats, the recognized text and OCR-related information co-exist in the same file and survives editing and manipulation. hOCR markup is independent of the presentation.
There is a Public Specification for the hOCR Format. [Less]
1.51K
lines of code
4
current contributors
over 1 year
since last commit
0
users on Open Hub