Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Ординатура / Офтальмология / Английские материалы / Assistive Technology for Visually Impaired and Blinde People_Hersh,Jonson_2008.pdf
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
12.16 Mб
Скачать

15.2 Basics of Optical Character Recognition Technology

557

Figure 15.1. Printed document access challenge

decreasing costs, compared to past decades. The barriers to print access are coming down, and the technology roadmap for future advances is reasonably clear. More and more visually impaired people are driving their own access to information. Let us explore more about the technology that brings reading and independent literacy to people with visual disabilities.

15.2 Basics of Optical Character Recognition Technology

Delivering the actual text and structure of a document is increasingly the core requirement for text access. The process of having a sighted person intervene, by reading aloud books, newspapers and documents, or entering them as Braille, is on the wane because of expense and the delay in production. It is ironic that, in a world where almost all documents are generated electronically, access to those electronic documents is so difficult! Authors and publishers of books are concerned about piracy and worry about making books easily available in electronic form, although

558 15 Accessing Books and Documents

they rarely object to access for people with disabilities. Making the connection to the right person is logistically difficult. Solutions are needed to provide access directly to books.

Optical character recognition (OCR) systems meet this need. They recreate the document in electronic form from a scanned image of the document. A desktop image scanner is used to optically scan each page: taking a digital picture of the page.

The scanner is increasingly a standard piece of equipment in offices and home computing setups. A very popular product is the combined scanner/printer, which can also act as a copier. The most frequent configuration is a flat glass platen on which the document or book is placed. A scanning bar with a lamp moves under the glass plate, illuminating and imaging the page line by line, typically representing the image as an array of pixels at a resolution of 300 dots per inch (118 dots per centimetre). This scanned image file is transferred to a computer for additional processing. The scanned image of the page is quite useful for tasks such as sending a facsimile or making a photocopy.

However, the image cannot be directly used to generate Braille or synthetic voice output. OCR systems analyze the picture of the page, find the letters and words, and generate a text file with words, paragraphs and pages. This text file can be turned into Braille or sent to a voice synthesizer, and thus made accessible.

Imagine a business letter. A photograph of that letter can be used to reproduce the letter on a printed page or on a screen. However, the picture of the page will not be accessible to a blind person at all. Seeing the picture of the words is not the same as understanding the words, and a computer is able to communicate the words to a blind person only after the computer has identified the words within the picture.

OCR technology turns that picture into words (see Figure 15.2). On a personal computer a word processing program can then edit the letter, just as if someone had retyped that business letter. Because the words are available in the word processing program, a specialized computer program called a screen reader can read them aloud or send them to a Braille display or printer.

The OCR process often makes mistakes. Identifying the words from a picture of a page can be difficult, especially if the document is of poor quality or the print is small. The OCR technology breaks the picture of the page down into lines of text, and then further subdivides the picture into words and letters. By analyzing the picture of a letter and where it stands on the line, the OCR can usually tell which letter it is (for example, a capital ‘P’ vs a lower case ‘p’). The OCR then builds up words, lines, paragraphs and pages. Understanding how the OCR process operates can help in recognizing its limitations.

15.2.1 Details of Optical Character Recognition Technology

OCR technology imitates the human perception process of visual reading. This technology has steadily progressed over the last 50 years, but it is still not the equal of human readers. OCR uses a methodical approach to analyzing what is on a page. A typical OCR process involves the following steps:

15.2 Basics of Optical Character Recognition Technology

559

Figure 15.2. OCR process

Step 1. Adjusting the contrast of the page image

Step 2. Removing speckle and other image noise from the image of the page Step 3. Identifying if the text is sideways or upside down (or if the page is blank) Step 4. De-skewing the page (straightening it if it was scanned at a slight angle) Step 5. Finding blocks of text on the page

Step 6. Finding the lines in the text blocks

Step 7. Identifying the baseline of the line of text (to distinguish between capital and lower-case letters)

Step 8. Isolating the pictures of individual words and letters Step 9. Recognizing the letters by analyzing their features Step 10. Assembling words from the recognized letters

Step 11. Resolving uncertain letters or words, using linguistic rules and dictionaries

Step 12. Reconstructing the lines, paragraphs and page in the desired format (for example, ASCII, Microsoft Word or RTF)

It is interesting to note that Step 3 above, where the orientation of the page is detected by the computer, was not a standard component of early commercial OCR systems because it was assumed that a person placing the page on the scanner could see whether or not the text was upside down before scanning. After being made aware of this need and seeing early implementations of it by organizations making reading systems for the blind, the OCR vendors added this accommodation so that an image scanned using the wrong orientation would be detected and digitally rotated into the correct orientation before the OCR proceeded. Now, this

560 15 Accessing Books and Documents

is a standard feature of all the commercial OCR packages, and sighted people benefit from it as well since they can now place a page on a scanner without checking that it is in the “correct” orientation.

Using the old computing dictum of “garbage in, garbage out” (also known as “rubbish in, rubbish out”), poor quality images will lead to inaccurate OCR. One of the major technology efforts in character recognition is improving the quality of the image being recognized. Good OCR depends on a good quality scanner to capture an accurate image of the text. Cameras, hand-operated scanners and inexpensive sheet-fed scanners (where the page is moved past a fixed line sensor) generally fall short of providing the best quality images for OCR. Flatbed scanners and higher end sheet-fed scanners reliably provide good page scans.

The standard image scanning resolution of 300 dpi/118 dpc is almost always sufficient for standard text documents. A typical small character scanned at this resolution is roughly 20–30 pixels high, which is enough for the OCR to distinguish similar looking characters (such as ‘evs c’). Figure 15.3 shows examples of individual character images of varying quality.

Processing of the scanned image before recognition is critical to remove the garbage. Contrast is a critical parameter, especially in modern documents (such as magazines) with text printed on complex, and often multicoloured, backgrounds. Character recognition engines need just the characters and reject the background and picture content. This need competes with other imaging tasks where the goal is to accurately render the original document on the computer display. Some scanning processes have a binary contrast set manually, where others add automatic contrast technology or will process a colour or grey scale image of the document. Binary document images keep just one bit of data per pixel: whether it is black or white. Grey images typically keep 8 bits of data per pixel, which is equivalent to 255 values from black to white. Colour images are often scanned at 24 bits per pixel, 8 bits for each of 3 colours. For OCR, these are generally reduced to binary images through processing.

The other image processing steps in OCR are designed to make the image better after scanning. Despeckling removes noise: the little flecks of ink, paper imperfections or scanning quirks that otherwise might show up as punctuation in odd places on the reproduced pages. Orienting and straightening the page improves the accuracy of the OCR, and recognition of speciality font characteristics such as italics. Some OCR software can recognize the difference between blocks of text and blocks of image on a page, and do the appropriate processing on each type of block.

The middle part of the process is actually the recognition of a character or glyph. For example, in typography the letters ‘fl’ are typically printed as a single digraph rather than as two separate letters, so the OCR treats this as another type of character to be recognized. By processing the text blocks, lines of text and words in successively smaller units, the OCR engine is presented with a single character image to recognize. It generally does this by transforming the character image into a set of features. In the original OCR engines, this feature set might be as simple as normalizing the image onto a 5×9 grid and using which grid elements were on or off as the features. A very typical feature used in modern OCR is aspect ratio: the ratio of the height to the width of a character. For example, in many feature sets