The Challenge: Digitising Old Texts
Black letter fonts, also known as “Gebrochene Schriften” or broken scripts, first emerged as early as the 12th century, and evolved over the years to consist of a variety of derivations and font types.
Common characteristics and peculiarities of the type include the elongated s and ligatures, or “joined” letters for certain letter combinations. The frequency of its application makes the understanding of Fraktur essential for studying text and developing recognition technologies for the period between 1800 and 1938.
A Solution form ABBYY: Standard OCR v.s. "Gothic/Fraktur" OCR
*Processed with ABBYY Recognition Server: Gothic/Fraktur enabled/disabled
The sample clearly shows that tuned and optimised recognition technologies have to be used when processing historic documents printed in old fonts.
The same, of course, applies when “old” and “modern” fonts are mixed.
IMPACT Centre of Competence
… is a new, none profit organisation with the mission to make the digitisation of historical printed text “better, faster, cheaper”. It will provide tools, services and facilities to further advance the state-of-the-art in the field of document imaging, language technology and the processing of historic text.