Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Ординатура / Офтальмология / Английские материалы / Assistive Technology for Visually Impaired and Blinde People_Hersh,Jonson_2008.pdf
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
12.16 Mб
Скачать

12.4 Audio-transcription of Printed Information

401

12.4 Audio-transcription of Printed Information

Blind and visually impaired people require access to a wide variety of different types of printed information, including books, newspapers, menus and timetables. One of the earliest approaches to making print accessible was the talking book. This involved making a recording of the book being read, generally by volunteer readers. Once the original recording was made, multiple copies were produced, originally on tape and currently on cassette or CD. The copies were then distributed to be played on an appropriate player. This approach has also been used to produce talking newspapers. The main advantage of this approach is that the recording sounds natural since a person, rather than synthetic speech has been used to produce it. The drawback of this is that it is time intensive and expensive if the recordings are made by paid staff. In addition, it is most suitable for items such as books with a stable text that will be used for an extended period. It is less practicable for items such as menus, timetables and theatre programmes that are constantly changing.

This gives the need for reading systems or devices that can read items as they are presented, rather than having a recording prepared in advance. Such reading systems generally include text-to-speech conversion software, which is discussed in more detail in Chapter 14. A simple classification of reading systems is given in Figure 12.9. It should be noted that one of the main distinctions is between standalone reading systems and reading systems that are computer-based. Stand-alone reading systems and the Read-IT project are discussed in the next two sections.

DAISY technology (discussed in Chapter 15) has been developed as a standard for audio output of printed material. The idea of a standard navigable format for visually impaired end-users to access information in audio format is clearly a good one. However, the time lag due to number of factors including the time spent in developing the format and working for acceptance of it, have meant that the technology has evolved. Despite considerable hard work to publicise DAISY, it has not been taken up on a large scale by publishers and many publishers are unaware of it or how they could use it. It is hoped that this situation will change in due course.

From the end-user perspective there are advantages in having output which can be played on widely available standard devices, such as a CD, cassette or MP3 player, rather than requiring a special player. As this indicates, there are advantages in a design for all approach to providing information in audio format to anyone and everyone who might want to use it, including visually impaired and other print disabled people. This may mean revisiting and updating the DAISY standard from a design for all perspective or ensuring that visually impaired and other end-users have a choice of a number of different formats, including DAISY, so they can choose the one that is best suited to their needs or even use different formats in different circumstances.

12.4.1 Stand-alone Reading Systems

Stand-alone reading systems are independent systems that are able to able to scan a printed document (including letters, books, leaflets and newspapers) and

402 12 Accessible Information: An Overview

Figure 12.9. Stand-alone text-to-speech (TTS) technologies

Figure 12.10. Block diagram of reading system operations

produce an audio (or tactile) version of the document for visually impaired and blind readers. The sequence of operations carried out by reading systems is shown in Figure 12.10.

These operations comprise the following three main stages:

Stage 1. The camera and scanning mechanism create an image file.

Stage 2. Optical recognition software and/or hardware converts the image file to a text file.

Stage 3. Text-to-speech software uses the text file to drive a speech synthesizer card and speaker unit, thereby producing an audio speech output.

Commercial reading systems generally have alternative input and output options. For instance, on the input side the system may be able to read from CD ROM or from stored files and, in addition to audio output using a speaker and headphones, the output may be displayed as text on a computer screen.

12.4 Audio-transcription of Printed Information

403

Figure 12.11a,b. Scanning and Reading Appliance, SARA™: a SARA™, in action; b SARA™, the control panel (photographs reproduced by kind permission of Freedom Scientific, USA)

Figure 12.11 shows photographs of the Scanning and Reading Appliance (SARA™) developed by Freedom Scientific, USA. Some of the key functions of SARA are listed in Table 12.1. As can be seen from Figure 12.11 and Table 12.1, SARA is a highly convenient route to the audio transcription of printed information. However, it is not very portable and therefore more suitable for applications in a fixed location than for use while moving around. A portable system, called Read It, is discussed in the next section.

12.4.2 Read IT Project

Portable devices generally have the advantage of reducing costs since the same device can be used in different locations and is easier to handle. In the case of reading systems, there is a wide range of textual information, including menus, price tags, bus and train timetables, indicator boards and theatre programmes, found in different locations that could not be read with a fixed system. In addition to the technical issues associated with portability, further technical challenges are posed by the wide variety of material to be read, the difficulties associated with reading hand written and poor quality texts and the fact that some textual information, such as street signs and indicator boards, generally have to be read at a distance.

404 12 Accessible Information: An Overview

Table 12.1. Some technical feature of SARA™ reading appliance

Controls

Large, colour coded with tactile markings and symbols

 

See Figure 12.11b for layout

 

Search facilities: single word; single line, fast forward, rewind,

 

move up page, move down page

Speech control

Controls for speech rate, and volume, selection of the voice from

 

a voice set, choice of languages from 17 options

Input

Scanned documents, background scanning operation

 

Files (.txt, .rtf, .doc, .pdf, .html)

 

CD ROM drive, DAISY books, microphone input

Output

Stereo speakers (integral to appliance), audio jack for headphones

 

Text output to computer screen with display options

Some technical specifications

Power: 100–240 V, rear power jack input

 

Size: 50.8 × 8.89 × 30.48 cm

 

Weight: 8.16 kg

 

20 GB hard disk drive; 256 MB RAM; 600 MHz processor

 

 

One approach to producing a portable reading system for blind people is the prototype Read IT project (Chmiel et al. 2005) carried out by a student team from the Department of Computer Science and Management of Poznan University of Technology, Poland under the guidance of Dr. Jacek Jelonek.

Enduser aspects of the Read IT system

End-user involvement in the development of (assistive) technology systems from the start is crucial to ensure that the resulting device does meet the needs of the end-user community, and reduce the likelihood of it being rejected. In the Read IT project the development team worked with the Polish Association of Blind People to draw up a list of end-user requirements, which included the following (Chmiel et al. 2005):

1.The device should be comfortable (portable and lightweight) to wear and should integrate the user into the wider community not identify them as different.

2.The user should have their hands free to engage in other activities whilst listening to the speech output.

3.The user should be able to hear other sounds as well as the generated speech.

4.The user should be able to move onto other tasks once positioning and capture of the text is complete.

5.The generated speech should be clearly understandable and resemble human speech.

These requirements from the end-user community were translated into design specifications and influenced the final design and implementation. Requirement 1 led to the device being lightweight, portable and as unobtrusive as possible. Requirements 2, 3 and 4 arise from safety considerations and the requirement for the

12.4 Audio-transcription of Printed Information

405

user to, for instance, have their hands free to use a long cane and/or carry shopping while using the device. They resulted in the speech output being delivered to only one earphone that was directly inserted into the ear. Requirement 4 enables the end-user to either relax or move onto other tasks once the text to be read has been located and captured. Requirement 5 has been translated into a design specification for the quality of the speech synthesizer card used in the device.

Engineering issues and implementation

As illustrated in Figure 12.12, an image of the text is captured by a video camera positioned in the user’s sunglasses. This image is analysed and the text content identified drives the speech synthesizer card. The resulting speech output is then delivered to the user via a single earphone. The signal processing unit, speech synthesizer card and battery power supply are housed in a small box worn at waist level. Manual control is via a small hand-held Braille key pad. A full description of the development process is given in the Read IT report (Chmiel et al. 2005). In view of the requirement that the device should integrate the user into the wider community, the video camera could presumably be worn on glasses with plain lenses at times when there is little sun. However, difficulties could be encountered in transferring the device between different spectacle frames.

In contrast to technologies, such as mobile phones, where miniaturisation causes difficulties for blind and visually impaired people, it is component miniaturisation that has made portable reading devices, such as Read It, feasible. In particular, an important feature is the miniature video camera that can be unobtrusively mounted on the user’s sunglasses. A PVI-430D video was selected and this captures 30 frames per second at a resolution of 640 × 480 pixels. The very small size of the video

Figure 12.12. Overview of the Read IT system (Chmiel et al. 2005)

406 12 Accessible Information: An Overview

Figure 12.13. Video capture unit (Chmiel et al. 2005)

capture unit can be seen from Figure 12.13 where the microcontroller chip is just 5 × 5 mm. The camera range is between 0.4 and 0.8 m for standard sized fonts. It is generally feasible to approach to within this distance of bus timetables, indicator boards and street signs.

Software issues

The Read IT project used a mix of standard software and customised software and DSP algorithms developed by the project. The steps involved in the digital signal processing algorithms required are shown in Figure 12.14.

Two aspects of this digital signal processing architecture are especially interesting. First, there is a “navigation task” with associated “navigation messages”. This module generates voiced directional instructions to the user to ensure that the camera is directed at the text to be read. The audio message feedback loop was designed to optimise and enhance image capture and identification within the device. Once a satisfactory image has been captured, then the important operations of analysing the video text captured can proceed. This involves a number of subtasks, including text segmentation, enhancement and recognition. This is

Figure 12.14. Digital signal processing framework for Read IT (Chmiel et al. 2005)