06.603GPPEnhanced full rate speech transcodingTS
The function of the decoder consists of decoding the transmitted parameters (LP parameters, adaptive codebook vector, adaptive codebook gain, fixed codebook vector, fixed codebook gain) and performing synthesis to obtain the reconstructed speech. The reconstructed speech is then post‑filtered and upscaled. The signal flow at the decoder is shown in figure 4.
6.1 Decoding and speech synthesis
The decoding process is performed in the following order:
Decoding of LP filter parameters: The received indices of LSP quantization are used to reconstruct the two quantified LSP vectors. The interpolation described in clause 5.2.6 is performed to obtain 4 interpolated LSP vectors (corresponding to 4 subframes). For each subframe, the interpolated LSP vector is converted to LP filter coefficient domain, which is used for synthesizing the reconstructed speech in the subframe.
The following steps are repeated for each subframe:
1) Decoding of the adaptive codebook vector: The received pitch index (adaptive codebook index) is used to find the integer and fractional parts of the pitch lag. The adaptive codebook vector is found by interpolating the past excitation (at the pitch delay) using the FIR filter described in clause 5.6.
2) Decoding of the adaptive codebook gain: The received index is used to readily find the quantified adaptive codebook gain, from the quantization table.
3) Decoding of the innovative codebook vector: The received algebraic codebook index is used to extract the positions and amplitudes (signs) of the excitation pulses and to find the algebraic codevector. If the integer part of the pitch lag is less than the subframe size 40, the pitch sharpening procedure is applied which translates into modifying by, where is the decoded pitch gain, , bounded by [0.0,1.0].
4) Decoding of the fixed codebook gain: The received index gives the fixed codebook gain correction factor . The estimated fixed codebook gain is found as described in clause 5.7. First, the predicted energy is found by:
and then the mean innovation energy is found by:
The predicted gain is found by:
The quantified fixed codebook gain is given by:
5) Computing the reconstructed speech: The excitation at the input of the synthesis filter is given by:
Before the speech synthesis, a post‑processing of excitation elements is performed. This means that the total excitation is modified by emphasizing the contribution of the adaptive codebook vector:
Adaptive gain control (AGC) is used to compensate for the gain difference between the non‑emphasized excitation and emphasized excitation The gain scaling factor for the emphasized excitation is computed by:
The gain‑scaled emphasized excitation signal is given by:
The reconstructed speech for the subframe of size 40 is given by:
where are the interpolated LP filter coefficients.
The synthesized speech is then passed through an adaptive postfilter which is described in the following clause.
Post‑processing consists of two functions: adaptive post‑filtering and signal up‑scaling.
6.2.1 Adaptive post‑filtering
The adaptive postfilter is the cascade of two filters: a formant postfilter, and a tilt compensation filter. The postfilter is updated every subframe of 5 ms.
The formant postfilter is given by:
where is the received quantified (and interpolated) LP inverse filter (LP analysis is not performed at the decoder), and the factors and control the amount of the formant post‑filtering.
Finally, the filter compensates for the tilt in the formant postfilter and is given by:
where is a tilt factor, with being the first reflection coefficient calculated on the truncated impulse response, , of the filter. is given by:
The post‑filtering process is performed as follows. First, the synthesized speech is inverse filtered through to produce the residual signal . The signal is filtered by the synthesis filter . Finally, the signal at the output of the synthesis filter is passed to the tilt compensation filter resulting in the post‑filtered speech signal .
Adaptive gain control (AGC) is used to compensate for the gain difference between the synthesized speech signal and the post‑filtered signal . The gain scaling factor for the present subframe is computed by:
The gain‑scaled post‑filtered signal is given by:
where is updated in sample‑by‑sample basis and given by:
where is a AGC factor with value of 0.9.
The adaptive post‑filtering factors are given by:, and
Up‑scaling consists of multiplying the post‑filtered speech by a factor of 2 to compensate for the down‑scaling by 2 which is applied to the input signal.