FineReader XIX

End of the Product-Lifecycle

  • FineReader XIX was released 2003 and
    is not maintained any longer
  • The distribution/sales of the product has been
    stopped September 2011
  • Recognition Server with Gothic/Fraktur support is the successor, it offers:
    • better recognition results
    • better scalability
    • more features
    • … for a better price!

First Omnifont OCR for Fraktur and Old European Scripts

ABBYY FineReader XIX is a special version of the award-winning FineReader optical character recognition (OCR) software for recognising “fraktur” or “black letter” texts from the period between 1800 and 1938. It is designed to convert scans of old documents, books, and papers into text for the purpose of digital archiving and publishing, and it is the first omnifont OCR software for Fraktur.

The Solution: First Omnifont OCR for Fraktur

ABBYY FineReader XIX is the first omnifont OCR for Frak­tur, giving users a solution for scanning and converting old documents with minimal training and dictionary work. This was achieved by combining extremely intelli­gent technology with dedicated linguistic study:

OCR systems work by analysing a text image and making a hypothesis about which letter or word an image repre­sents. The hypotheses are analysed in context and veri­fied by use of sophisticated OCR dictionaries made up of Language Models (LMs). Language Models (LM) are computer databases that describe the vocabulary of a language. The problem is that modern OCR systems do not have LMs for older text fonts and older text spellings. The solution for Fraktur text recognition was achieved through the development of OCR dictionaries specifically for this time period. Special language models were cre­ated for five European languages.

The Fraktur language models were created with the help of ABBYY partner, ATAPY Software. Through develop­ment process, 10 different dictionaries and more than 105 books published between 1 808 and 1 930 were analysed. Linguists reviewed word stock, identified words that have phased out through the evolution of the languages, and identified the correct paradigm assign­ments for synchronising the language models with the appropriate grammar usage for the time period. More than 500.000 word entries were manually compared with existing FineReader dictionaries.

Grammatical paradigms and word evolutions were reviewed to add 159 historic grammar paradigms that were missing from the contemporary language models. Language models were then compiled and tested on a control group of testing documents featuring old text.

To recognise the Fraktur style fonts, ABBYY development teams created special classifiers, or alphabets, capable of recognising the Fraktur symbols. As part of this effort, ABBYY development teams collected a symbol image base with an average of 2500 symbol samples for each symbol, a new alphabet pattern, and collected and input a sample test base representing 31000 pages of text from different sources. Using the sample text, the recog­nition engine was “fine-tuned” to work with the subtle features of the Fraktur alphabet (such as the ligatures, or connected letters). The new alphabet was then added to the FineReader XIX and interface and tested exten­sively.

Further Information: