6 Functional description of the decoder

06.903GPPAdaptive Multi-Rate speech transcodingTS

The function of the decoder consists of decoding the transmitted parameters (LP parameters, adaptive codebook vector, adaptive codebook gain, fixed codebook vector, fixed codebook gain) and performing synthesis to obtain the reconstructed speech. The reconstructed speech is then post‑filtered and upscaled. The signal flow at the decoder is shown in figure 4.

6.1 Decoding and speech synthesis

The decoding process is performed in the following order:

Decoding of LP filter parameters: The received indices of LSP quantization are used to reconstruct the quantified LSP vectors. The interpolation described in subclause 5.2.6 is performed to obtain 4 interpolated LSP vectors (corresponding to 4 subframes). For each subframe, the interpolated LSP vector is converted to LP filter coefficient domain , which is used for synthesizing the reconstructed speech in the subframe.

The following steps are repeated for each subframe:

1) Decoding of the adaptive codebook vector: The received pitch index (adaptive codebook index) is used to find the integer and fractional parts of the pitch lag. The adaptive codebook vector is found by interpolating the past excitation (at the pitch delay) using the FIR filter described in subclause 5.6.

2) Decoding of the innovative codebook vector: The received algebraic codebook index is used to extract the positions and amplitudes (signs) of the excitation pulses and to find the algebraic codevector . If the integer part of the pitch lag, T, is less than the subframe size 40, the pitch sharpening procedure is applied which translates into modifying by , where is the decoded pitch gain, , bounded by [0.0,1.0] or [0.0,0.8], depending on mode.

3) Decoding of the adaptive and fixed codebook gains: In case of scalar quantization of the gains (12.2 kbit/s and 7.95 kbit/s modes) the received indices are used to readily find the quantified adaptive codebook gain, , and the quantified fixed codebook gain correction factor, , from the corresponding quantization tables. In case of vector quantization of the gains (all other modes), the received index gives both the quantified adaptive codebook gain, , and the quantified fixed codebook gain correction factor, . The estimated fixed codebook gain is found as described in subclause 5.7. First, the predicted energy is found by:

(66)

and then the mean innovation energy is found by:

. (67)

The predicted gain is found by:

. (68)

The quantified fixed codebook gain is given by:

. (69)

4) Smoothing of the fixed codebook gain (10.2, 6.70, 5.90, 5.15, 4.75 kbit/s modes): An adaptive smoothing of the fixed codebook gain is performed to avoid unnatural fluctuations in the energy contour. The smoothing is based on a measure of the stationarity of the short-term spectrum in the q domain. The smoothing strength is computed from this measure. An averaged q-value is computed for each frame n by:

. (70)

For each subframe m, a difference measure between the averaged vector and the quantized and interpolated vector is computed by:

, (71)

where j runs over the 10 LSPs. Furthermore, a smoothing factor,, is computed by:

, (72)

where the constants are set to and . A hangover period of 40 subframes is used where the -value is set 1.0 if the has been above 0.65 for 10 consecutive frames. A value of 1.0 corresponds to no smoothing. An averaged fixed codebook gain value is computed for each subframe by:

. (73)

The fixed codebook gain used for synthesis is now replaced by a smoothed value given by:

. (74)

5) Anti-sparseness processing (7.95, 6.70, 5.90, 5.15, 4.75 kbit/s modes): An adaptive anti-sparseness post-processing procedure is applied to the fixed codebook vector in order to reduce perceptual artifacts arising from the sparseness of the algebraic fixed codebook vectors with only a few non-zero samples per subframe. The anti-sparseness processing consists of circular convolution of the fixed codebook vector with an impulse response. Three pre-stored impulse responses are used and a number is set to select one of them. A value of 2 corresponds to no modification, a value of 1 corresponds to medium modification, while a value of 0 corresponds to strong modification. The selection of the impulse response is performed adaptively from the adaptive and fixed codebook gains. The following procedure is employed:

Detect onset by comparing the fixed codebook gain to the previous fixed codebook gain. If the current value is more than twice the previous value an onset is detected.

If not onset and , the median filtered value of the current and the previous 4 adaptive codebook gains are computed. If this value is less than 0.6, .

If not onset, the -value is restricted to increase by one step from the previous subframe.

If an onset is declared, the -value is increased by one if it is less than 2.

6) Computing the reconstructed speech: The excitation at the input of the synthesis filter is given by:

. (75)

Before the speech synthesis, a post‑processing of excitation elements is performed. This means that the total excitation is modified by emphasizing the contribution of the adaptive codebook vector:

(76)

Adaptive gain control (AGC) is used to compensate for the gain difference between the non‑emphasized excitation and emphasized excitation The gain scaling factor for the emphasized excitation is computed by:

(77)

The gain‑scaled emphasized excitation signal is given by:

. (78)

The reconstructed speech for the subframe of size 40 is given by:

. (79)

where are the interpolated LP filter coefficients.

7) Additional instability protection: An additional instability protection is implemented in the speech decoder which is monitoring overflows in the synthesis filter. If an overflow has occurred in the synthesis part, the whole adaptive codebook memory, is scaled down by a factor of 4, and the synthesis filtering is repeated using this down-scaled memory. I.e. in this case step 6) is repeated, except that the post-processing in (76) – (78) of the excitation signal is by-passed.The synthesized speech is then passed through an adaptive postfilter which is described in the following clause.

6.2 Post‑processing

6.2.1 Adaptive post‑filtering (all modes)

The adaptive postfilter is the cascade of two filters: a formant postfilter, and a tilt compensation filter. The postfilter is updated every subframe of 5 ms.

The formant postfilter is given by:

(80)

where is the received quantified (and interpolated) LP inverse filter (LP analysis is not performed at the decoder), and the factors and control the amount of the formant post‑filtering.

Finally, the filter compensates for the tilt in the formant postfilter and is given by:

(81)

where is a tilt factor, with being the first reflection coefficient calculated on the truncated () impulse response, , of the filter . is given by:

. (82)

The post‑filtering process is performed as follows. First, the synthesized speech is inverse filtered through to produce the residual signal . The signal is filtered by the synthesis filter . Finally, the signal at the output of the synthesis filter is passed to the tilt compensation filter resulting in the post‑filtered speech signal .

Adaptive gain control (AGC) is used to compensate for the gain difference between the synthesized speech signal and the post‑filtered signal . The gain scaling factor for the present subframe is computed by:

. (83)

The gain‑scaled post‑filtered signal is given by:

(84)

where is updated in sample‑by‑sample basis and given by:

(85)

where is a AGC factor with value of 0.9.

12.2, 10.2 kbit/s modes

The adaptive post‑filtering factors are given by: , and

. (86)

7.95, 7.40, 6.70, 5.90, 5.15, 4.75 kbit/s modes

The adaptive post‑filtering factors are given by: , and .

6.2.2 High-pass filtering and up-scaling (all modes)

The high-pass filter serves as a precaution against undesired low frequency components. A filter cut-off frequency of 60 Hz is used, and the filter is given by

. (87)

Up‑scaling consists of multiplying the post‑filtered speech by a factor of 2 to compensate for the down‑scaling by 2 which is applied to the input signal.