BiblioTECA Implementation

Verba Logica Home page | BiblioTECA Home page | BiblioTECA Implementation | Previous | Next

IDR module

As already said, IDR performs the character recognition process on previously scanned document images. A hierarchical data model underlies the IDR module design. The model can be represented as a tree having as lowest level that of grapheme, and at the top the image to be read, with intermediate levels of Area, Line and Word.

The module contains two submodules: a) IDR sensu stricto, a module in which automatic character recognition is implemented and b) the Videocoding module, a word and character correction module.

The data processing inside this module could be represented as follows:

Diagram 2

Images are processed in batch. The functionality in the module is ordered around MCS data model. It is then possible to inspect, for each image, every object related to it.

The interface includes a Document Structure Dialogue that allows for parametrisation of language, column detection, resolution, borders and spacing between documents. It is worth noting that IDR possesses multi-column capacity.

During image pre-processing borders are removed, the image quality is assessed and noise is removed. The next phase is segmentation. A precise evaluation of line height as well as inter-word, inter-letter distance and dynamic word distance detection is performed in this phase.

The phase of Optical Character Recognition uses an external library with multifont recognition capabilities.

Dictionaries help in the analysis of linguistic context of a recognised text and at the same time provide alternatives and suggestions for correction.

So the main IDR features are:

A user manual has been written for the IDR module.

Videocoding: the correction submodule

It is both menu and toolbar driven and accepts keyboard shortcuts. The main window contains a view of the original document and a focus on the words on which the IDR has recognition problems. The correction is performed at word level. This allows the correction of the whole word or of part of it

Additional information to IDR

The following list of deliverables has been produced:

IDR softwareSoftware
Videocoding SoftwareSoftware
Videocoding User ManualUser Manual
IDR User ManualUser Manual
IDR Algorithmic DocumentationTechnical Report
IDR Tecnical DocumentationTechnical Report

Verba Logica Home page | BiblioTECA Home page | Previous | Next