
- •Lecture 1 phonetics
- •1.1. Phonetics and communication
- •1.2. Articulation: how sounds are made and classified
- •1.2.1. Consonants
- •1.2.2. Consonants in context
- •1.2.3. Vowels
- •1.2.4. Vowels in context
- •1.3. Language acquisition: how speech sounds are learned
- •1.4. Acoustics: how speech sounds are processed and described
- •Summary
1.4. Acoustics: how speech sounds are processed and described
Speech technologies is such a fast developing area of applied phonetics that we cannot predict what the facilities for speech processing will be like tomorrow. Whatever we write may be outdated very quickly. Our task, therefore, is to take a look at the principal ways speech can be observed and analyzed.
In the phonetic laboratory we analyze speech using computers. The acoustic properties of sound signal are easier to observe than the exact positions of the tongue: we can record the signal and measure it thanks to the computer programs which are designed to process it. For each of the phonetic classes of sound that we have identified we can find corresponding acoustic patterns.
When talking, we produce sound waves which travel in the air and reach the eardrum of the listener. This acoustic signal can be recorded and processed with the help of a computer program, then observed as a waveform on display. We can see acoustic patterns corresponding to each class of sounds.
Vowels, for instance, are periodic sounds: they have a regular pattern of vibration repeating its pattern over and over. Fricatives like /f, s, \/ are aperiodic sounds having an irregular, messy pattern. Voiceless stops (plosives) start with silence (at the complete closure of the mouth) which we see as absence of signal, a gap. When the closure is released there is an aperiodic sound, like a brief fricative. If/p, t, k/ are aspirated, the aperiodic sound is like [h]. Voiced stops (plosives) are periodic during the time the vocal tract is closed but the English /b, d, g/ actually have very little voicing (see Figure 9).
We can measure the time taken by each particular sound on the horizontal axis. The parameter is called duration. The frequency of vibration and the amplitude (the amount of energy contained in the sound) can be taken in the acoustic spectral analysis. The underlying principle is that the complex waveform of a sound can be broken down into simple waveforms of different frequencies (like breaking down white light into rainbow pattern of colours), and we can measure the energy at each frequency. In the picture which is called spectrogram they look like dark bands and are called formants. Darker bands represent greater energy. The vertical axis represents the frequency scale, the horizontal axis shows the time. Thus we can see how energy is distributed at different frequencies at any stage of the sound production.
In vowels the energy is concentrated in three or four narrow bands (formants) in the lower part of the spectrum. The formant with the lowest frequency, Formant 1 (Ft) corresponds roughly to the traditional open/close dimension: a low F, corresponds to a close vowel, like ft/ or /ul/. dormant 2 (F2), which is higher than Ft corresponds roughly to the front/back dimension of vowels: a vowel with a high F2 is likely to be a front vowel like /e/ or /a/. A low F2 is more likely to be a back vowel like /t>/, /a:/.
Frequency values vary from speaker to speaker but, nevertheless, there are group means (averaged data) for BBC female newscasters, for example, against which the speech of Queen Elizabeth II was compared to find Her Majesty's progress towards a more democratic national standard pronunciation of vowels.
Prosody (or intonation), among the suprasegmental (spreading over a number of segments) features can also be analyzed acoustically. The parameters are: fundamental frequency (F^, intensity (Int), duration (T), which correspond to the perceptual categories of pitch, loudness, length.
Fundamental frequency curves show where the voice goes up (rise) or down (fall). A change in fundamental frequency may serve to distinguish the meaning of words in tone languages or the meaning of sentences in intonation languages (see Figure 10). It is also one of the major components of accent (stress) in English and Russian.
The duration of segments, syllables, intonation units and pauses may demonstrate the tempo and rhythm of speech. Duration is also a powerful means of stress in Russian and a few dialects of English where speech is monotone, such as conversational American English or Irish English, for instance.
Intensity level and change may demonstrate the way speakers employ loudness change: British English speakers keep up a high level of intensity while Russian speakers keep changing it: the fact suggests that loudness variation is more important for accent in Russian, while pitch change is, no doubt, the major accentual means in English.
An important side of acoustic phonetics is speech synthesis. Although scientists are still dissatisfied with the quality of synthesized speech, it is a very useful tool to learn more about a person's reactions to various modifications of sounds and their properties. Acoustic phonetics is also used in health service, in security systems and for communication and other technical purposes in aviation and the navy, for instance.
There are phonetic laboratories which employ other instrumental techniques, such as palatography and ultrasonic equipment; even X-ray methods used to be practised to find the exact location of articulators in the process of speaking. The minimal equipment which we need today will consist of the tape-recorder and the computer.