Why apply OCR on Historic Documents?

OCR Opens our Cultural Heritage

  • Text based search technologies can only access old documents after OCR is applied
    • OCR allows much better access to historic documents and books
  • OCRed historic text is easier to read
  • Conversion to “modern” digital formats such as
    • XML - with meta information, like layout information
    • Searchable PDFs
    • Ebooks
  • OCRed text can be re-used, for example:
    • re-print
    • online access

Historical Knowledge is needed for Modern Science

  • Scientists, Librarians and Researchers can extend information retrieval systems point/reference on a much more granular level, for example:
    • Paragraphs or sentences or words can be directly accessed – instead of “just” giving an issue number, page or paragraph
    • the required text can be found via full text search
  • Electronic side by side comparison of books/documents/articles becomes possible which offers advantages for scientific work

Differences in Fonttypes

The following diagram shows the differences between “round” and “broken” fonts. It is obvious that documents printed in “old” fonts look very different and that they are hard to read, even for humans.

Image Source: http://de.wikipedia.org/wiki/Gebrochene_Schrift

More about this topic can be found on Blackletter Fonts on Wikipedia


Back to: Historic OCR Overview

Further Information

A more technical details about the optical character recognition (OCR) can be found on the ABBYY Developer Portal