Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Примеры брендбуков / apple_osx-aqua-human-interface.pdf
Скачиваний:
17
Добавлен:
16.02.2016
Размер:
8.01 Mб
Скачать

C H A P T E R 1 6

Speech Recognition and Synthesis

Speech Recognition

It’s important to distinguish between the speech engine and the applications that call the engine. The Mac OS X speech-recognition engine

is speaker independent. Users don’t have to invest any time training it to recognize their voice before they can use it.

supports continuous speech. Users don’t have to pause between words.

has a large vocabulary (more than 120,000 words) and linguistic analysis to predict the correct pronunciation of words not in its dictionary.

works with “far-field” microphones, so users don’t have to tether themselves to the computer with a headset. In addition, most Macintosh computers have a built-in microphone. All microphones in Macintosh computers are optimized to work well with the Mac OS X speech-recognition engine.

works with a finite-state grammar. This is the most successful general-purpose speech technology and is optimal for uses such as interactive dialogs, command-and-control, and language/literacy. It is not optimal, however, for unrestricted dictation.

In order to have the most flexibility in using speech recognition, an application should call the speech engine functions directly. Doing so requires the following steps:

1.Tell the engine what to listen for (that is, define what users can say).

2.Start listening.

3.Act on the message sent to the application when the engine hears a defined command.

Alternatively, you can easily provide basic speech control of your application by taking advantage of Apple’s Speakable Items application (see “Speakable Items” (page 255)).

254Speech Recognition

Apple Computer, Inc. June 2002

C H A P T E R 1 6

Speech Recognition and Synthesis

Speakable Items

Speakable Items, an application built in to the Mac OS X user interface, calls the speech-recognition engine and provides all users with the ability to control their computer by voice. It does this by creating a folder in the user’s Library folder (Library/Speech/Speakable Items). Anything in the Speakable Items folder is launched when the user speaks its name; saying its name is equivalent to double-clicking that item’s icon, except that it works even if the folder is not visible or the Finder is not active.

Developers can add their own items to the Speakable Items folder—such as AppleScript scripts, documents, templates, applications, or aliases—and when the user speaks the item’s name it executes.

The Speakable Items folder can also contain XML files that associate spoken commands with keyboard shortcuts. “Make this bold,” for example, sends Command-B; “Copy this to the Clipboard” sends Command-C.

The Speakable Items folder also contains an Application Speakable Items folder, which contains a subfolder for each application, so that you can create spoken commands that apply only to your application. Items in your application’s folder are speakable only when your application is active (frontmost). Alternatively, you can include application-specific speakable items in a folder within your application bundle.

The Speech Recognition Interface

Mac OS X provides a consistent, well-integrated user interface for speech-recognition across all applications. This interface comprises the following items:

The Speech pane of System Preferences is where users can control general speech-recognition settings, regardless of which application is using it. These settings include microphone volume (helpful for using non-Apple microphones) and the listening mode (push-to-talk versus continuous listening). Developers get these interface features for free regardless of how they use the speech-recognition engine.

The Speech pane also contains controls specific to the Speakable Items application, such as whether Speakable Items is on or off and whether it applies to menus and window controls.

Speech Recognition

255

Apple Computer, Inc. June 2002

C H A P T E R 1 6

Speech Recognition and Synthesis

The speech feedback window provides information about the level of sound input, whether the system is actively listening, and which listening method the user has chosen.

Figure 16-1 The speech feedback window

The Speech Commands window shows users what they can say at a specific time. It also displays what the speech-recognition engine “heard” and what it spoke to the user in response.

Figure 16-2 The Speech Commands window

256Speech Recognition

Apple Computer, Inc. June 2002

C H A P T E R 1 6

Speech Recognition and Synthesis

Speech-Recognition Errors

Because speech and sounds can be ambiguous, the speech-recognition process sometimes produces errors. Two such errors are the following:

A rejection error occurs when the system hears something it considers speech (rather than noise) but can’t match the sound to a known command. By default, this kind of error returns “???”; your application can specify its own rejection word or other response.

A substitution error occurs when the system incorrectly interprets a sound— recognizing the command “cut” as “quit,” for example.

Because substitution errors are generally more annoying to users than rejection errors, the speech-recognition engine has been tuned to prefer to reject rather than substitute.

With the “Listen for” AppleScript command, you can easily test your application’s spoken commands without writing any code. Consult the Mac OS X Developer Tools CD for examples of using the speech-recognition server’s “Listen for” command.

Guidelines for Implementing Speech Recognition

To minimize speech-recognition errors, observe the following guidelines in designing your spoken interface.

Avoid commands that sound similar but have different meanings. For example, “Turn backups on” and “Turn backups off” differ by only one phoneme and might be confused by the recognizer in a noisy environment.

Avoid single-word commands; they are less distinctive and can be confused by the recognizer. “Cut” sounds similar to “Quit,” for example. Phrases that are from three to six words long are more distinctive and will be better recognized.

Define commands that are easy to remember, feel natural to say, and don’t conflict with menu items or controls.

Provide speech-recognition commands that add value by doing more than can be accomplished through a single click or keyboard equivalent.

Speech Recognition

257

Apple Computer, Inc. June 2002

Соседние файлы в папке Примеры брендбуков