(photo from www.ehow.co.uk)
Today I posted a 500GB hard drive to India where the tif files on it will be converted into htm and pdf files. This process, known as OCR (Optical Character Recognition), turns static images into fully searchable text. The htm file is what users see when the transcript page is displayed and it is coded in xhtml so that tables and text can be manipulated.
The OCR process takes a good while and the NLS demands high accuracy rates, but the results are worth it.
For more on OCR please see the IMPACT Centre of Competence Digitisation website – http://www.digitisation.eu/ which was launched at the end of October, and the ABBYY website – http://finereader.abbyy.com/about_ocr/whatis_ocr/