hOCR is a format for representing OCR output, including layout information, character confidences, bounding boxes, and style information. It embeds this information invisibly in standard HTML. By building on standard HTML, it automatically inherits well-defined support for most scripts, languages, and common layout options. Furthermore, unlike previous OCR formats, the recognized text and OCR-related information co-exist in the same file and survives editing and manipulation. hOCR markup is independent of the presentation.
There is a Public Specification for the hOCR Format.
Commercial Use
Modify
Distribute
Place Warranty
Sub-License
Private Use
Use Patent Claims
Hold Liable
Use Trademarks
Include Copyright
State Changes
Include License
Include Notice
These details are provided for information only. No information here is legal advice and should not be used as such.
There are no reported vulnerabilities
30 Day SummaryMar 20 2024 — Apr 19 2024
|
12 Month SummaryApr 19 2023 — Apr 19 2024
|