Verba Logica Home page | BiblioTECA Home page | Previous | Next

A problem

Many information resources in libraries are second order: documents about documents. Some are integral part of primary documents: indexes, tables of contents, etc. Other result from librarian information processing: catalogue cards, printed catalogues ... This resources, usually product of many work years, are a fundamental asset for libraries: they represent the output of a long, painstaking and careful manual information processing on which libraries' services are based.

The increasing presence of computerised library environments implies the necessity either of repeating most of this information processing or to develop tools that allow for the transfer of this information to mechanic information processing systems.

BiblioTECA is an answer on the second line of thought. Its main concern is how to make accessible such information -coached in different media, coded according different codes- to a computerised library environment.

The transfer problem is two-faced:

  1. Physical: how to transfer characters from printed or dactylo-graphied paper to magnetic media?

  2. Logical: How to define systems that 'translate' the information to a different coding system? How to transfer information coded for human use in machine workable structures?

The physical aspect is clearly defined but not so easily solved. Everybody knows today about Optical Character Reading and their top precision reading rates (99% hits or higher is not unusual). The very restrictive conditions under which these rates are obtained are less well known.

The logical aspect of the problem is media independent. A coding and decoding problem exists as much when you try to transfer data from, say, a DBIII database to an Access database or when trying to translate ISBD bibliographic data to UKMARC. The same is true for the translation of a table of contents to its computer workable representation. Of course the more informal or less defined is the original coding, the more difficult this transfer problem.

A solution: BiblioTECA

BiblioTECA adresses both sides of the problem: The system primary application field is printed reference documents, although AFCA procedures can be applied -have been applied, actually- on documents already stored in magnetic media.

BiblioTECA's immediate aim is to capture the information in such reference documents, to process it and to represent it in SGML format.

From such a standard coding the translation to proprietary formats should be an easy task.

There are many possible applications for the analysis type proposed in BiblioTECA: MARC coding or other formatting when doing catalogue retrospective conversion, definition of lexica, theasuri and similar works, creation of databases from printed and/or non printed material, repositories of papers in scientific or other periodicals, conversion of printed catalogues to magnetic media, etc.

So, BiblioTECA intended use is to allow an easy access to information originally expressed in different media and/or formats in order to enhance bibliographic references in an automated library environment.

These goals define some important requirements on BiblioTECA:

  1. Several classes of documents, each with a different structure, should be treated already at the prototype testing stage.
  2. BiblioTECA should in some way mimic some skills involved in human information processing: uncovering text structure, filling in information gaps, correcting input errors, easy adapting to changes in the structure of documents, among other.
  3. Standard OCR output is far from perfect when used on not so well printed documents, so its performance should be improved.
Our proposed solution has been to devise a transfer system integrating to a certain extent reading and analysis. A flexible document description language allows the user an easy document structure definition. The system interprets this description as a parser for a document set. This offers a solution to points 1) and 2) above. A degree of integration between character recognition and analysis improves automatic reading output.

Verba Logica Home page | BiblioTECA Home page | Previous | Next