- •Preface
- •Contents
- •1 Disability and Assistive Technology Systems
- •Learning Objectives
- •1.1 The Social Context of Disability
- •1.2 Assistive Technology Outcomes: Quality of Life
- •1.2.1 Some General Issues
- •1.2.2 Definition and Measurement of Quality of Life
- •1.2.3 Health Related Quality of Life Measurement
- •1.2.4 Assistive Technology Quality of Life Procedures
- •1.2.5 Summary and Conclusions
- •1.3 Modelling Assistive Technology Systems
- •1.3.1 Modelling Approaches: A Review
- •1.3.2 Modelling Human Activities
- •1.4 The Comprehensive Assistive Technology (CAT) Model
- •1.4.1 Justification of the Choice of Model
- •1.4.2 The Structure of the CAT Model
- •1.5 Using the Comprehensive Assistive Technology Model
- •1.5.1 Using the Activity Attribute of the CAT Model to Determine Gaps in Assistive Technology Provision
- •1.5.2 Conceptual Structure of Assistive Technology Systems
- •1.5.3 Investigating Assistive Technology Systems
- •1.5.4 Analysis of Assistive Technology Systems
- •1.5.5 Synthesis of Assistive Technology Systems
- •1.6 Chapter Summary
- •Questions
- •Projects
- •References
- •2 Perception, the Eye and Assistive Technology Issues
- •Learning Objectives
- •2.1 Perception
- •2.1.1 Introduction
- •2.1.2 Common Laws and Properties of the Different Senses
- •2.1.3 Multisensory Perception
- •2.1.4 Multisensory Perception in the Superior Colliculus
- •2.1.5 Studies of Multisensory Perception
- •2.2 The Visual System
- •2.2.1 Introduction
- •2.2.2 The Lens
- •2.2.3 The Iris and Pupil
- •2.2.4 Intraocular Pressure
- •2.2.5 Extraocular Muscles
- •2.2.6 Eyelids and Tears
- •2.3 Visual Processing in the Retina, Laternal Geniculate Nucleus and the Brain
- •2.3.1 Nerve Cells
- •2.3.2 The Retina
- •2.3.3 The Optic Nerve, Optic Tract and Optic Radiation
- •2.3.4 The Lateral Geniculate Body or Nucleus
- •2.3.5 The Primary Visual or Striate Cortex
- •2.3.6 The Extrastriate Visual Cortex and the Superior Colliculus
- •2.3.7 Visual Pathways
- •2.4 Vision in Action
- •2.4.1 Image Formation
- •2.4.2 Accommodation
- •2.4.3 Response to Light
- •2.4.4 Colour Vision
- •2.4.5 Binocular Vision and Stereopsis
- •2.5 Visual Impairment and Assistive Technology
- •2.5.1 Demographics of Visual Impairment
- •2.5.2 Illustrations of Some Types of Visual Impairment
- •2.5.3 Further Types of Visual Impairment
- •2.5.4 Colour Blindness
- •2.5.5 Corrective Lenses
- •2.6 Chapter Summary
- •Questions
- •Projects
- •References
- •3 Sight Measurement
- •Learning Objectives
- •3.1 Introduction
- •3.2 Visual Acuity
- •3.2.1 Using the Chart
- •3.2.2 Variations in Measuring Visual Acuity
- •3.3 Field of Vision Tests
- •3.3.1 The Normal Visual Field
- •3.3.2 The Tangent Screen
- •3.3.3 Kinetic Perimetry
- •3.3.4 Static Perimetry
- •3.4 Pressure Measurement
- •3.5 Biometry
- •3.6 Ocular Examination
- •3.7 Optical Coherence Tomography
- •3.7.1 Echo Delay
- •3.7.2 Low Coherence Interferometry
- •3.7.3 An OCT Scanner
- •3.8 Ocular Electrophysiology
- •3.8.1 The Electrooculogram (EOG)
- •3.8.2 The Electroretinogram (ERG)
- •3.8.3 The Pattern Electroretinogram
- •3.8.4 The Visual Evoked Cortical Potential
- •3.8.5 Multifocal Electrophysiology
- •3.9 Chapter Summary
- •Glossary
- •Questions
- •Projects
- •4 Haptics as a Substitute for Vision
- •Learning Objectives
- •4.1 Introduction
- •4.1.1 Physiological Basis
- •4.1.2 Passive Touch, Active Touch and Haptics
- •4.1.3 Exploratory Procedures
- •4.2 Vision and Haptics Compared
- •4.3 The Capacity of Bare Fingers in Real Environments
- •4.3.1 Visually Impaired People’s Use of Haptics Without any Technical Aid
- •4.3.2 Speech Perceived by Hard-of-hearing People Using Bare Hands
- •4.3.3 Natural Capacity of Touch and Evaluation of Technical Aids
- •4.4 Haptic Low-tech Aids
- •4.4.1 The Long Cane
- •4.4.2 The Guide Dog
- •4.4.3 Braille
- •4.4.4 Embossed Pictures
- •4.4.5 The Main Lesson from Low-tech Aids
- •4.5 Matrices of Point Stimuli
- •4.5.1 Aids for Orientation and Mobility
- •4.5.2 Aids for Reading Text
- •4.5.3 Aids for Reading Pictures
- •4.6 Computer-based Aids for Graphical Information
- •4.6.1 Aids for Graphical User Interfaces
- •4.6.2 Tactile Computer Mouse
- •4.7 Haptic Displays
- •4.7.1 Information Available via a Haptic Display
- •4.7.2 What Information Can Be Obtained with the Reduced Information?
- •4.7.3 Haptic Displays as Aids for the Visually Impaired
- •4.8 Chapter Summary
- •4.9 Concluding Remarks
- •Questions
- •Projects
- •References
- •5 Mobility: An Overview
- •Learning Objectives
- •5.1 Introduction
- •5.2 The Travel Activity
- •5.2.1 Understanding Mobility
- •5.2.2 Assistive Technology Systems for the Travel Process
- •5.3 The Historical Development of Travel Aids for Visually Impaired and Blind People
- •5.4 Obstacle Avoidance AT: Guide Dogs and Robotic Guide Walkers
- •5.4.1 Guide Dogs
- •5.4.2 Robotic Guides and Walkers
- •5.5 Obstacle Avoidance AT: Canes
- •5.5.1 Long Canes
- •5.5.2 Technology Canes
- •5.6 Other Mobility Assistive Technology Approaches
- •5.6.1 Clear-path Indicators
- •5.6.2 Obstacle and Object Location Detectors
- •5.6.3 The vOICe System
- •5.7 Orientation Assistive Technology Systems
- •5.7.1 Global Positioning System Orientation Technology
- •5.7.2 Other Technology Options for Orientation Systems
- •5.8 Accessible Environments
- •5.9 Chapter Summary
- •Questions
- •Projects
- •References
- •6 Mobility AT: The Batcane (UltraCane)
- •Learning Objectives
- •6.1 Mobility Background and Introduction
- •6.2 Principles of Ultrasonics
- •6.2.1 Ultrasonic Waves
- •6.2.2 Attenuation and Reflection Interactions
- •6.2.3 Transducer Geometry
- •6.3 Bats and Signal Processing
- •6.3.1 Principles of Bat Sonar
- •6.3.2 Echolocation Call Structures
- •6.3.3 Signal Processing Capabilities
- •6.3.4 Applicability of Bat Echolocation to Sonar System Design
- •6.4 Design and Construction Issues
- •6.4.1 Outline Requirement Specification
- •6.4.2 Ultrasonic Spatial Sensor Subsystem
- •6.4.3 Trial Prototype Spatial Sensor Arrangement
- •6.4.4 Tactile User Interface Subsystem
- •6.4.5 Cognitive Mapping
- •6.4.6 Embedded Processing Control Requirements
- •6.5 Concept Phase and Engineering Prototype Phase Trials
- •6.6 Case Study in Commercialisation
- •6.7 Chapter Summary
- •Questions
- •Projects
- •References
- •7 Navigation AT: Context-aware Computing
- •Learning objectives
- •7.1 Defining the Orientation/Navigation Problem
- •7.1.1 Orientation, Mobility and Navigation
- •7.1.2 Traditional Mobility Aids
- •7.1.3 Limitations of Traditional Aids
- •7.2 Cognitive Maps
- •7.2.1 Learning and Acquiring Spatial Information
- •7.2.2 Factors that Influence How Knowledge Is Acquired
- •7.2.3 The Structure and Form of Cognitive Maps
- •7.3 Overview of Existing Technologies
- •7.3.1 Technologies for Distant Navigation
- •7.3.2 User Interface Output Technologies
- •7.4 Principles of Mobile Context-aware Computing
- •7.4.1 Adding Context to User-computer Interaction
- •7.4.2 Acquiring Useful Contextual Information
- •7.4.3 Capabilities of Context-awareness
- •7.4.4 Application of Context-aware Principles
- •7.4.5 Technological Challenges and Unresolved Usability Issues
- •7.5 Test Procedures
- •7.5.1 Human Computer Interaction (HCI)
- •7.5.2 Cognitive Mapping
- •7.5.3 Overall Approach
- •7.6 Future Positioning Technologies
- •7.7 Chapter Summary
- •7.7.1 Conclusions
- •Questions
- •Projects
- •References
- •Learning Objectives
- •8.1 Defining the Navigation Problem
- •8.1.1 What is the Importance of Location Information?
- •8.1.2 What Mobility Tools and Traditional Maps are Available for the Blind?
- •8.2 Principles of Global Positioning Systems
- •8.2.1 What is the Global Positioning System?
- •8.2.2 Accuracy of GPS: Some General Issues
- •8.2.3 Accuracy of GPS: Some Technical Issues
- •8.2.4 Frequency Spectrum of GPS, Present and Future
- •8.2.5 Other GPS Systems
- •8.3 Application of GPS Principles
- •8.4 Design Issues
- •8.5 Development Issues
- •8.5.1 Choosing an Appropriate Platform
- •8.5.2 Choosing the GPS Receiver
- •8.5.3 Creating a Packaged System
- •8.5.4 Integration vs Stand-alone
- •8.6 User Interface Design Issues
- •8.6.1 How to Present the Information
- •8.6.2 When to Present the Information
- •8.6.3 What Information to Present
- •8.7 Test Procedures and Results
- •8.8 Case Study in Commercialisation
- •8.8.1 Understanding the Value of the Technology
- •8.8.2 Limitations of the Technology
- •8.8.3 Ongoing Development
- •8.9 Chapter Summary
- •Questions
- •Projects
- •References
- •9 Electronic Travel Aids: An Assessment
- •Learning Objectives
- •9.1 Introduction
- •9.2 Why Do an Assessment?
- •9.3 Methodologies for Assessments of Electronic Travel Aids
- •9.3.1 Eliciting User Requirements
- •9.3.2 Developing a User Requirements Specification and Heuristic Evaluation
- •9.3.3 Hands-on Assessments
- •9.3.4 Methodology Used for Assessments in this Chapter
- •9.4 Modern-day Electronic Travel Aids
- •9.4.1 The Distinction Between Mobility and Navigation Aids
- •9.4.2 The Distinction Between Primary and Secondary Aids
- •9.4.3 User Requirements: Mobility and Navigation Aids
- •9.4.4 Mobility Aids
- •9.4.5 Mobility Aids: Have They Solved the Mobility Challenge?
- •9.4.6 Navigation Aids
- •9.4.7 Navigation Aids: Have They Solved the Navigation Challenge?
- •9.5 Training
- •9.6 Chapter Summary and Conclusions
- •Questions
- •Projects
- •References
- •10 Accessible Environments
- •Learning Objectives
- •10.1 Introduction
- •10.1.1 Legislative and Regulatory Framework
- •10.1.2 Accessible Environments: An Overview
- •10.1.3 Principles for the Design of Accessible Environments
- •10.2 Physical Environments: The Streetscape
- •10.2.1 Pavements and Pathways
- •10.2.2 Road Crossings
- •10.2.3 Bollards and Street Furniture
- •10.3 Physical Environments: Buildings
- •10.3.1 General Exterior Issues
- •10.3.2 General Interior Issues
- •10.3.4 Signs and Notices
- •10.3.5 Interior Building Services
- •10.4 Environmental Information and Navigation Technologies
- •10.4.1 Audio Information System: General Issues
- •10.4.2 Some Technologies for Environmental Information Systems
- •10.5 Accessible Public Transport
- •10.5.1 Accessible Public Transportation: Design Issues
- •10.6 Chapter Summary
- •Questions
- •Projects
- •References
- •11 Accessible Bus System: A Bluetooth Application
- •Learning Objectives
- •11.1 Introduction
- •11.2 Bluetooth Fundamentals
- •11.2.1 Brief History of Bluetooth
- •11.2.2 Bluetooth Power Class
- •11.2.3 Protocol Stack
- •11.2.4 Bluetooth Profile
- •11.2.5 Piconet
- •11.3 Design Issues
- •11.3.1 System Architecture
- •11.3.2 Hardware Requirements
- •11.3.3 Software Requirements
- •11.4 Developmental Issues
- •11.4.1 Bluetooth Server
- •11.4.2 Bluetooth Client (Mobile Device)
- •11.4.3 User Interface
- •11.5 Commercialisation Issues
- •11.6 Chapter Summary
- •Questions
- •Projects
- •References
- •12 Accessible Information: An Overview
- •Learning Objectives
- •12.1 Introduction
- •12.2 Low Vision Aids
- •12.2.1 Basic Principles
- •12.3 Low Vision Assistive Technology Systems
- •12.3.1 Large Print
- •12.3.2 Closed Circuit Television Systems
- •12.3.3 Video Magnifiers
- •12.3.4 Telescopic Assistive Systems
- •12.4 Audio-transcription of Printed Information
- •12.4.1 Stand-alone Reading Systems
- •12.4.2 Read IT Project
- •12.5 Tactile Access to Information
- •12.5.1 Braille
- •12.5.2 Moon
- •12.5.3 Braille Devices
- •12.6 Accessible Computer Systems
- •12.6.1 Input Devices
- •12.6.2 Output Devices
- •12.6.3 Computer-based Reading Systems
- •12.6.4 Accessible Portable Computers
- •12.7 Accessible Internet
- •12.7.1 World Wide Web Guidelines
- •12.7.2 Guidelines for Web Authoring Tools
- •12.7.3 Accessible Adobe Portable Document Format (PDF) Documents
- •12.7.4 Bobby Approval
- •12.8 Telecommunications
- •12.8.1 Voice Dialling General Principles
- •12.8.2 Talking Caller ID
- •12.8.3 Mobile Telephones
- •12.9 Chapter Summary
- •Questions
- •Projects
- •References
- •13 Screen Readers and Screen Magnifiers
- •Learning Objectives
- •13.1 Introduction
- •13.2 Overview of Chapter
- •13.3 Interacting with a Graphical User Interface
- •13.4 Screen Magnifiers
- •13.4.1 Overview
- •13.4.2 Magnification Modes
- •13.4.3 Other Interface Considerations
- •13.4.4 The Architecture and Implementation of Screen Magnifiers
- •13.5 Screen Readers
- •13.5.1 Overview
- •13.5.2 The Architecture and Implementation of a Screen Reader
- •13.5.3 Using a Braille Display
- •13.5.4 User Interface Issues
- •13.6 Hybrid Screen Reader Magnifiers
- •13.7 Self-magnifying Applications
- •13.8 Self-voicing Applications
- •13.9 Application Adaptors
- •13.10 Chapter Summary
- •Questions
- •Projects
- •References
- •14 Speech, Text and Braille Conversion Technology
- •Learning Objectives
- •14.1 Introduction
- •14.1.1 Introducing Mode Conversion
- •14.1.2 Outline of the Chapter
- •14.2 Prerequisites for Speech and Text Conversion Technology
- •14.2.1 The Spectral Structure of Speech
- •14.2.2 The Hierarchical Structure of Spoken Language
- •14.2.3 Prosody
- •14.3 Speech-to-text Conversion
- •14.3.1 Principles of Pattern Recognition
- •14.3.2 Principles of Speech Recognition
- •14.3.3 Equipment and Applications
- •14.4 Text-to-speech Conversion
- •14.4.1 Principles of Speech Production
- •14.4.2 Principles of Acoustical Synthesis
- •14.4.3 Equipment and Applications
- •14.5 Braille Conversion
- •14.5.1 Introduction
- •14.5.2 Text-to-Braille Conversion
- •14.5.3 Braille-to-text Conversion
- •14.6 Commercial Equipment and Applications
- •14.6.1 Speech vs Braille
- •14.6.2 Speech Output in Devices for Daily Life
- •14.6.3 Portable Text-based Devices
- •14.6.4 Access to Computers
- •14.6.5 Reading Machines
- •14.6.6 Access to Telecommunication Devices
- •14.7 Discussion and the Future Outlook
- •14.7.1 End-user Studies
- •14.7.2 Discussion and Issues Arising
- •14.7.3 Future Developments
- •Questions
- •Projects
- •References
- •15 Accessing Books and Documents
- •Learning Objectives
- •15.1 Introduction: The Challenge of Accessing the Printed Page
- •15.2 Basics of Optical Character Recognition Technology
- •15.2.1 Details of Optical Character Recognition Technology
- •15.2.2 Practical Issues with Optical Character Recognition Technology
- •15.3 Reading Systems
- •15.4 DAISY Technology
- •15.4.1 DAISY Full Audio Books
- •15.4.2 DAISY Full Text Books
- •15.4.3 DAISY and Other Formats
- •15.5 Players
- •15.6 Accessing Textbooks
- •15.7 Accessing Newspapers
- •15.8 Future Technology Developments
- •15.9 Chapter Summary and Conclusion
- •15.9.1 Chapter Summary
- •15.9.2 Conclusion
- •Questions
- •Projects
- •References
- •Learning Objectives
- •16.1 Introduction
- •16.1.1 Print Impairments
- •16.1.2 Music Notation
- •16.2 Overview of Accessible Music
- •16.2.1 Formats
- •16.2.2 Technical Aspects
- •16.3 Some Recent Initiatives and Projects
- •16.3.2 Play 2
- •16.3.3 Dancing Dots
- •16.3.4 Toccata
- •16.4 Problems to Be Overcome
- •16.4.1 A Content Processing Layer
- •16.4.2 Standardization of Accessible Music Technology
- •16.5 Unifying Accessible Design, Technology and Musical Content
- •16.5.1 Braille Music
- •16.5.2 Talking Music
- •16.6 Conclusions
- •16.6.1 Design for All or Accessibility from Scratch
- •16.6.2 Applying Design for All in Emerging Standards
- •16.6.3 Accessibility in Emerging Technology
- •Questions
- •Projects
- •References
- •17 Assistive Technology for Daily Living
- •Learning Objectives
- •17.1 Introduction
- •17.2 Personal Care
- •17.2.1 Labelling Systems
- •17.2.2 Healthcare Monitoring
- •17.3 Time-keeping, Alarms and Alerting
- •17.3.1 Time-keeping
- •17.3.2 Alarms and Alerting
- •17.4 Food Preparation and Consumption
- •17.4.1 Talking Kitchen Scales
- •17.4.2 Talking Measuring Jug
- •17.4.3 Liquid Level Indicator
- •17.4.4 Talking Microwave Oven
- •17.4.5 Talking Kitchen and Remote Thermometers
- •17.4.6 Braille Salt and Pepper Set
- •17.5 Environmental Control and Use of Appliances
- •17.5.1 Light Probes
- •17.5.2 Colour Probes
- •17.5.3 Talking and Tactile Thermometers and Barometers
- •17.5.4 Using Appliances
- •17.6 Money, Finance and Shopping
- •17.6.1 Mechanical Money Indicators
- •17.6.2 Electronic Money Identifiers
- •17.6.3 Electronic Purse
- •17.6.4 Automatic Teller Machines (ATMs)
- •17.7 Communications and Access to Information: Other Technologies
- •17.7.1 Information Kiosks and Other Self-service Systems
- •17.7.2 Using Smart Cards
- •17.7.3 EZ Access®
- •17.8 Chapter Summary
- •Questions
- •Projects
- •References
- •Learning Objectives
- •18.1 Introduction
- •18.2 Education: Learning and Teaching
- •18.2.1 Accessing Educational Processes and Approaches
- •18.2.2 Educational Technologies, Devices and Tools
- •18.3 Employment
- •18.3.1 Professional and Person-centred
- •18.3.2 Scientific and Technical
- •18.3.3 Administrative and Secretarial
- •18.3.4 Skilled and Non-skilled (Manual) Trades
- •18.3.5 Working Outside
- •18.4 Recreational Activities
- •18.4.1 Accessing the Visual, Audio and Performing Arts
- •18.4.2 Games, Puzzles, Toys and Collecting
- •18.4.3 Holidays and Visits: Museums, Galleries and Heritage Sites
- •18.4.4 Sports and Outdoor Activities
- •18.4.5 DIY, Art and Craft Activities
- •18.5 Chapter Summary
- •Questions
- •Projects
- •References
- •Biographical Sketches of the Contributors
- •Index
14.7 Discussion and the Future Outlook |
543 |
14.6.6 Access to Telecommunication Devices
There has been considerable development of telecommunication systems over the last decade and this has had an impact on telecommunication use by visually impaired and blind people. “Historically, people with a hearing disability have been the group facing the most problems when using telephones; however, with the ever increasing reliance on visual means for displaying information, it is increasingly visually impaired people who have been confronted with access problems” (Roe, 2001, p 30).
Speech technology can provide potential solutions, as in the case of the following input/output functions for mobile phones:
•Speech recognition is frequently used for voice-dialling. This feature was originally developed mainly for hands-free telephony in cars.
•Speech synthesis will be increasingly used for improving the user interface (speech MMI), caller name announcement, reading short messages (SMS) and remote access to e-mails.
Although these features were developed originally to provide improved performance for sighted users, they are very useful for visually impaired people and illustrate the benefits of a design for all approach. The technical prerequisites are the development of embedded speech input/output solutions (Hoffmann et al. 2004).
Despite the benefits of design for all, it is not able to resolve all problems and therefore visually impaired telecommunications users also require some special equipment. For instance, the Braillino system shown in Figure 14.21b is illustrated in combination with the Nokia Communicator. However, it can be used more generally with any mobile phone that uses the Symbian operating system, which is the global industry standard operating system for smartphones (www.symbian.com). This includes the Series 60 Phones (without an alphanumeric keyboard) and the Series 80 Phones (with an organizer function and an alphanumeric keyboard). The connection can be wireless via Bluetooth. From the functional point of view, the communication software (called Talks&Braille) acts as a screen reader for the Symbian operating system.
14.7 Discussion and the Future Outlook
14.7.1 End-user Studies
Potential users of speech technology would like to have (comparative) information on the performance of the available systems. However, it is difficult to obtain global comparative evaluations, due to the complexity of the systems and the fact that the evaluation criteria depend on the intended application. The studies carried out to date can be grouped and discussed as follows.
544 14 Speech, Text and Braille Conversion Technology
Evaluation of research systems
Progress in speech technology is normally measured in terms of improved word recognition rates (for recognizers) or improved scores when rating the naturalness (for synthesizers). Therefore, there are presentations giving an ongoing evaluation of research systems at the leading conferences. The availability of common databases allows the results of the evaluation of different systems to be compared. However, these research-oriented results relate to systems that are not yet commercially available, rather than the current state of the market.
Comparison with human performance
It is natural to compare speech technology with human performance. Every user of speech technology soon notices that it does not perform nearly as well as a person, but there are few quantitative assessments of this difference in performance. A fundamental investigation was carried out by Lippmann (1997) for speech recognizers. He demonstrated how the recognition rate breaks down in the presence of environmental noise, whereas human listeners perform essentially better. Corresponding results can be obtained by rating the quality of speech synthesis using a mean opinion score (MOS) scale ranging from 1 (bad) to 5 (excellent). The naturalness of human speech is rated close to 5, but the output of TTS systems is generally valued somewhere in the middle range; between 1.73 and 3.74 according to the survey by Alvarez and Huckvale (2002). Considerable further research will be required to close the gap in both speech recognition and speech synthesis compared to a human listener or speaker, respectively.
Evaluation of commercial systems
Before including speech technology in a product, a company generally evaluates a number of competing systems, though the results are only published occasionally. This type of study gives an interesting insight into the real performance of the available products. For example, Maase et al. (2003) investigated the performance of command and control speech recognizers for controlling a kitchen device. Usability studies showed that the users are accepting this kind of control for recognition rates greater than 85%. Tests with eight different products showed that this performance was never reached in real environments. Typical parts of these results are shown in Figure 14.23.
General product studies are very time-consuming and expensive. Therefore, they require a sponsor who has not got a vested interest in one of the products. For instance the study of ten different dictation systems (Flach et al. 2000) mentioned in Section 14.3.3 was originally produced for a computer journal. A more recent study (Stiftung Warentest 2004) of six dictation systems was carried out without publishing the recognition rates. The system with the best performance is indicated in the previously shown Table 14.8.
|
|
|
|
|
|
|
14.7 Discussion and the Future Outlook |
545 |
||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 14.23a,b. Selected results from the study of Maase et al. (2003). The diagrams show the recognition rate of selected C&C recognizers for different noises (a) and different speaker positions (b). The speaker positions describe different places in the usability lab with growing distance (from 1 to 7 m). Reprinted by courtesy of the authors
Evaluation for user groups with special needs
There is clearly a need for studies of speech support and dictation systems for blind and visually impaired people. Unfortunately, there is a distinct lack of large scale user studies of speech support systems for this user group. However, there are several more general studies which include consideration of speech technology to a certain extent. A number of such investigations have considered improving learning environments for blind students (Kahlisch 1998). Another emerging field is the study of the assistive technology needs for elderly people. Since many elderly people have acquired visual impairments, these studies include useful material on speech-related technologies. Figure 14.24 presents an example.
14.7.2 Discussion and Issues Arising
An overview of the remarks in this chapter shows that the performance of speech input/output systems is by no means perfect, despite improved algorithms, larger databases, increased memories, and growing computing power. In general, this still somewhat disappointing performance is due to the extreme complexity of human speech processing which it is difficult to satisfactory approximate by technical systems. Although there is not space to discuss the reasons for this less than satisfactory performance in detail, some of the reasons for this are briefly summa-
546 14 Speech, Text and Braille Conversion Technology
Figure 14.24. Example of a usability study. The diagram shows the acceptance of speech controlled services by different user groups according to the study of Hampicke (2004). Reprinted by courtesy of the author. Score of 6: in any case. Score of 3: medium. Score of 0: in no case. The legend describes the grade of visual impairment
rized in Table 14.10. Examining this table leads to the following conclusions about important future directions for basic research in speech technology:
•Speech understanding.
•Acoustic front end.
•Modelling human speech and language processing.
These topics are all highly interdisciplinary and will require interdisciplinary work.
14.7.3 Future Developments
As discussed in this chapter, speech technology has established itself as a stable and successful component of assistive technology. Speech technology is also becoming increasingly successful in other fields with a greater economic impact, including in the telecommunications area for communication with call centres and telephone banking. Although beyond the remit of this chapter, a survey of user opinions of this technology would be interesting, since there is at least anecdotal evidence that users prefer to communicate with a person and are highly dissatisfied with call centres. According to recent data (Sohn 2004), the turnover in business applications of speech technology will grow from $ 540 millions currently worldwide to $ 1600 millions in the year 2007.
This growth in the use of speech technology is not surprising in view of the importance of speech in telephone applications and consequently also for their automation. The importance of speech input/output systems relative to other media is likely to grow, as can be seen from Table 14.11.
What will this tendency mean for blind and visually impaired people? Developments in speech technology will improve access to interfaces for an increasing range of applications for this group (though not for deafblind people). The resulting benefits are likely to be substantial and cover applications ranging from access to numerous knowledge sources to improved accessibility of household appliances.
14.7 Discussion and the Future Outlook |
547 |
Table 14.10. Actual research problems in speech technology, explained by means of the general scheme of a speech processing system (Figure 14.4)
Where are |
How can the problems |
What can research do |
Examples for first |
|
the problems |
be described? |
to solve the problems? |
solutions |
|
localized |
|
|
|
|
in Figure 14.4? |
|
|
|
|
|
|
|
|
|
At the top |
Our systems do not |
Develop speech |
Speech-to-speech |
|
of the figure |
understand what they |
understanding, cooperate |
translation systems like |
|
|
do. The scheme is ending |
with computer |
Verbmobil |
|
|
at text level without |
linguistics/AI/semiotics |
(Wahlster 2000) |
|
|
semantic components |
|
In speech synthesis: |
|
|
|
|
concept-to-speech (CTS) |
|
|
|
|
instead of TTS |
|
At the bottom |
The acoustic channel |
Consider the system |
Acoustic signal processing |
|
of the figure |
between the user and the |
(recognizer or |
such as: |
|
|
converters (microphone |
synthesizer, respectively) |
• |
Microphone arrays. |
|
or loudspeaker, |
and the acoustic |
||
|
respectively) is still |
environment as a unit |
• |
Noise suppression. |
|
neglected in most cases |
and develop the |
• |
Source separation. |
|
|
“acoustic frontend” |
||
|
|
|
|
|
|
|
|
• |
Directed sound supply. |
In the |
Because our |
Although a technical |
Many activities in |
|
components |
understanding of human |
system needs not to be |
modelling prosody in |
|
of the figure |
speech processing is far |
a close copy of the |
close cooperation of |
|
|
from an applicable level, |
biological counterpart, |
engineers and |
|
|
the models which we use |
we need essentially more |
phoneticians during |
|
|
are more or less |
knowledge of human |
the last decade; |
|
|
mathematical or empiric |
speech production and |
Research systems which |
|
|
|
perception |
model human acoustic |
|
|
|
|
processing |
|
|
|
|
|
|
Table 14.11. How to interact with future systems? An overview from Weyrich (2003)
Small devices |
Speech |
Service robots |
Speech and gestures, artifical skin, emotions |
Federation of systems |
Speech and gestures, emotions |
e-Business |
Active dialogue systems, interactive multimedia |
Augmented reality systems |
Speech, gestures |
|
|
548 14 Speech, Text and Braille Conversion Technology
Talking products which are of interest to both sighted and visually impaired people are more attractive to companies due to their larger markets and therefore this type of product is more likely to be widely available from standard suppliers and at a reasonable price than specialised products for visually impaired people. For instance, blind and many visually impaired users require speech (or tactile) output to state the function of the key being pressed or the knob setting on the (complex) control panel of a washing machine. This audio option may also be of interest to sighted users. The inclusion of both speech and tactile output could be considered part of a design for all approach, but, as already indicated, though design for all should be part of good design practice, it will never totally replace the need for assistive devices.
There is therefore considerable potential for increasing accessibility to blind and visually impaired people, though further technical developments will be required. However, it should also be noted that access to new technologies is limited by a number of different factors, including geography and poverty. The term ‘digital divide’ is often used to describe the difference between people who do and do not have access to modern technologies and the resulting disadvantages, whereas the term eInclusion is used for access to the information society by disabled people and other potentially disadvantaged groups. While it is important to ensure that blind and visually impaired people are able to fully participate in the information society, it should also be recognised that some people, both blind and sighted, do not like technology. It will therefore also be important to ensure that there are also low technology accessibility solutions for blind and visually impaired people and that information is available in a number of different formats, including but not solely electronically.
Speech and language technology will always be compared to natural human speech and language. Therefore, regardless of progress, they are likely to be found wanting for a long time to come, if not permanently. This presents an ongoing challenge, which is probably much greater than that encountered in many other disciplines. As Waibel and Lee (1990) state in their preface to Readings in Speech Recognition: “Many advances have been made during these past decades; but every new technique and every solved puzzle opens a host of new questions and points us in new directions. Indeed, speech is such an intimate expression of our humanity—of our thoughts and emotions—that speech recognition is likely to remain an intellectual frontier as long as we search for a deeper understanding of ourselves in general, and intelligent behaviour in particular.”
Acknowledgement. As can be seen from the list of references, the material in this chapter is based on research results and teaching material of the chair for speech communication at the Technische Universität Dresden. The author would like to take the opportunity to thank his team for their fruitful cooperation on many projects.
Special thanks for helpful discussions and support to Professor Dieter Mehnert, formerly at the Humboldt-Universität zu Berlin, Professor Klaus Fellbaum, Brandenburgische Technische Universität Cottbus, Professor Gerhard Weber, Universität Kiel, and Dr. Lothar Seveke, Computer für Behinderte GmbH, Dresden.
