- •Вопрос 1 Structure of a Speech Coding System
- •Вопрос 3 Desirable Properties of a Speech Coder
- •Вопрос 4 About Coding Delay
- •Вопрос 5 classification of speech coders
- •Вопрос 6 Origin of Speech Signals
- •Вопрос 7 Structure of the Human Auditory System
- •Вопрос 8 Absolute Threshold
- •Вопрос 9 speech coding standards
- •Вопрос 10 pitch period estimation
- •Вопрос 11 linear prediction
- •Вопрос 12 Error Minimization
- •Вопрос 13/14 Prediction Schemes
- •0 10 20
- •Вопрос 15 long-term linear prediction
- •0 0.5 1
- •0 0.5 1
- •Вопрос 16/17 Linear Predictive Coding (lpc)
- •16. Speech encoding. Lpc encoder
- •Overview
- •Lpc coefficient representations
- •Applications
- •20 / 21 . Speech encoding. Celp coder
- •22/23. Speech encoding. Ld-celp coder
- •14.1 Strategies to achieve low delay
- •24/25 Speech encoding. Acelp (g.729) coder
- •35. Jpeg2000 in video compression(mjpeg)
- •36. Coding for high quality moving pictures(mpeg-2)
20 / 21 . Speech encoding. Celp coder
CODE-EXCITED LINEAR PREDICTION
The CELP coder relies on the long-term and short-term linear prediction models.
Figure 11.1 shows the block diagram
of the speech production model, where an excitation sequence is extracted from the
codebook through an index. The extracted excitation is scaled to the appropriate
level and filtered by the cascade connection of pitch synthesis filter and formant
synthesis filter to yield the synthetic speech. The pitch synthesis filter creates periodicity in the signal associated with the fundamental pitch frequency, and the formant synthesis filter generates the spectral envelope.
Encoder Operation
A block diagram of a generic CELP encoder is shown in Figure 11.9. This encoder
is highly simplistic and serves only as an illustration. Subsequent chapters contain
the details of operation of different standard CELP coders. The encoder works as
follows:
Input speech signal is segmented into frames and subframes. As explained in
Chapter 4, the scheme of four subframes in one frame is a popular choice.
Length of the frame is usually around 20 to 30 ms, while for the subframe it is
in the range of 5 to 7.5 ms.
Short-term LP analysis is performed on each frame to yield the LPC.
Afterward, long-term LP analysis is applied to each subframe (Chapter 4).
Input to short-term LP analysis is normally the original speech, or preemphasized speech; input to long-term LP analysis is often the (short-term)prediction error. Coefficients of the perceptual weighting filter, pitch synthesis
filter, and modified formant synthesis filter are known after this step.
The excitation sequence can now be determined. The length of each excitation
codevector is equal to that of the subframe; thus, an excitation codebook
search is performed once every subframe. The search procedure begins with
the generation of an ensemble of filtered excitation sequences with the
corresponding gains; mean-squared error (or sum of squared error) is
computed for each sequence, and the codevector and gain associated with
the lowest error are selected.
The index of excitation codebook, gain, long-term LP parameters, and LPC
are encoded, packed, and transmitted as the CELP bit-stream.
Decoder Operation
A block diagram of the CELP decoder is shown in Figure 11.10. It basically
unpacks and decodes various parameters from the bit-stream, which are directed
to the corresponding block so as to synthesize the speech. A postfilter is added at
the end to enhance the quality of the resultant signal; structure of this filter is
described in Section 11.5.
22/23. Speech encoding. Ld-celp coder
LOW-DELAY CELP
In the process of speech encoding and decoding, delay is inevitably introduced.
Loosely defined, delay is the amount of time shift between the speech signal at
the input of the encoder with respect to the synthetic speech at the output of the
decoder, when the output of the encoder is directly connected to the input of the
decoder. For schemes such as PCM and ADPCM (Chapter 6), the speech signal
is encoded on a sample-by-sample basis: a few bits are found for each sample
with the result transmitted immediately; the delay associated with these schemes
is negligible