BiblioTECA at Work

Verba Logica Home page | BiblioTECA Home page | BiblioTECA at work | Previous | Next


Stages in document processing: some examples

We will see now some examples of document processing in BiblioTECA. We zoom over the process displayed in Diagram 1 to highlight the main moments in document processing.

From image to text

BiblioTECA starting point are images such as the following table of contents from an issue of 'The Journal of Symbolic Logic'

Diagram 2
Diagram 2

The Intelligent Document Recognition module processes these images, which are the output of printed documents scanning. IDR essential aim is to automatically read characters in images. It performs an analysis of the image yielding as output a list of the document characters. This list possibly contains hints to typographical traits and other features related to the hierarchical model used in the IDR analysis. See below IDR module for a more detailed explanation of IDR data flow and characteristics.

As it is almost impossible to get a 'perfect' reading of text, IDR processing has an associated correction module, called Videocodage. It is a Windows menu driven ergonomic module in which reading correction is easy, presenting together original images, their reading and correction suggestions.

IDR outputs three kinds of data structure, readable by the AFCA module:

  1. MCS hierarchical model based structure.
  2. ASCII text.
  3. ASCII tagged.
The first contains the whole character reading including information about words, lines and 'areas of interest', offering alternatives to character reading when appropriate. It is the richer structure and, as a consequence, the most demanding from the AFCA module: the LENDEX grammar should take into account this richer structure. The second contains plain ASCII text and the third contains a tagged version of the ASCII text that annotates bold, italic and other font features. AFCA can read files formatted according to any of these structures. This includes of course other ASCII text files originated elsewhere.



Verba Logica Home page | BiblioTECA's Home page | Previous | Next