06.913GPPSubstitution and muting of lost frames for AMR speech traffic channelsTS
The C code of the following example is embedded in the bit exact software of the codec. In the code the ECU is designed to allow subframe-by-subframe synthesis, thereby reducing the speech synthesis delay to a minimum.
6.1 State Machine
This example solution for substitution and muting is based on a state machine with seven states (Figure 1).
The system starts in state 0. Each time a bad frame is detected, the state counter is incremented by one and is saturated when it reaches 6. Each time a good speech frame is detected, the state counter is reset to zero, except when we are in state 6, where we set the state counter to 5. The state indicates the quality of the channel: the larger the value of the state counter, the worse the channel quality is. The control flow of the state machine can be described by the following C code (BFI = bad frame indicator, State = state variable):
if(BFI != 0 )
State = State + 1;
else if(State == 6)
State = 5;
State = 0;
if(State > 6 )
State = 6;
In addition to this state machine, the Bad Frame Flag from the previous frame is checked (prevBFI). The processing depends on the value of the State-variable. In states 0 and 5, the processing depends also on the two flags BFI and prevBFI.
The procedure can be described as follows:
Figure 1: State machine for controlling the bad frame substitution
6.2 Assumed Active Speech Frame Error Concealment Unit Actions
6.2.1 BFI = 0, prevBFI = 0, State = 0
No error is detected in the received or in the previous received speech frame. The received speech parameters are used in the normal way in the speech synthesis. The current frame of speech parameters is saved.
6.2.2 BFI = 0, prevBFI = 1, State = 0 or 5
No error is detected in the received speech frame, but the previous received speech frame was bad. The LTP gain and fixed codebook gain are limited below the values used for the last received good subframe:
where = current decoded LTP gain,= LTP gain used for the last good subframe (BFI = 0), and
where = current decoded fixed codebook gain and = fixed codebook gain used for the last good subframe (BFI = 0).
The rest of the received speech parameters are used normally in the speech synthesis. The current frame of speech parameters is saved.
6.2.3 BFI = 1, prevBFI = 0 or 1, State = 1…6
An error is detected in the received speech frame and the substitution and muting procedure is started. The LTP gain and fixed codebook gain are replaced by attenuated values from the previous subframes:
where = current decoded LTP gain, = LTP gains used for the last n subframes, median5() = 5-point median operation, P(state) = attenuation factor (P(1) = 0.98, P(2) = 0.98, P(3) = 0.8, P(4) = 0.3, P(5) = 0.2, P(6) = 0.2), state = state number, and
where = current decoded fixed codebook gain, = fixed codebook gains used for the last n subframes, median5() = 5-point median operation, C(state) = attenuation factor (C(1) = 0.98, C(2) = 0.98, C(3) = 0.98, C(4) = 0.98, C(5) = 0.98, C(6) = 0.7), and state = state number.
The higher the state value is, the more the gains are attenuated. Also the memory of the predictive fixed codebook gain is updated by using the average value of the past four values in the memory:
The past LSFs are shifted towards their mean:
where a = 0.95, lsf_q1 and lsf_q2 are two sets of LSF-vectors for current frame, past_lsf_q is lsf_q2 from the previous frame, and mean_lsf is the average LSF-vector. Note that two sets of LSFs are available only in the 12.2 mode.
22.214.171.124 LTP-lag update
The LTP-lag values are replaced by the past value from the 4th subframe of the previous frame (12.2 mode) or slightly modified values based on the last correctly received value (all other modes).
126.96.36.199 Innovation sequence
The received fixed codebook innovation pulses from the erroneous frame are always used in the state in which they were received when corrupted data are received . In the case when no data were received random fixed codebook indices should be employed.
6.3 Assumed Non-Active Speech Signal Error Concealment Unit Actions
The Non-Active Speech ECU is used to reduce the negative impact of amplitude variations and tonal artefacts when using the conventional Active Speech ECU in non-voiced signals such as background noise and unvoiced speech. The background ECU actions are only used for the lower rate Speech Coding modes of TCH-FS and TCH-HS.
The Non-Active Speech ECU actions are done as postprocessing actions of the Active Speech ECU, actions thus ensuring that the Active Speech ECU states are continuously updated. This will guarantee instant and seamless switching to the Active Speech ECU. The detectors and state updates have to be running continuously for all speech coding modes to avoid switching problems.
Only the differences to the Active Speech ECU are stated below.
188.8.131.52 Background detector
An energy level and energy change detector is used to monitor the signal. If the signal is considered to contain background noise and only shows minor energy level changes, a flag is set. The resulting indicator is the inBackgroundNoise flag which indicates the signal state of the previous frame.
184.108.40.206 Voicing detector
The received LTP gain is monitored and used to prevent the use of the background ECU actions in possibly voiced segments. A median filtered LTP gain value with a varying filter memory length is thresholded to provide the correct voicing decision. Additionally, a counter voicedHangover is used to monitor the time since a frame was presumably voiced.
6.3.3 Background ECU Actions
The BFI, and DFI indications are used together with the flag inBackgroundNoise and the counter voicedHangover to adjust the LTP part and the innovation part of the excitation. The actions are only taken if the previous frame has been classified as background noise and sufficient time has passed since the last voiced frame was detected.
The background ECU actions are: energy control of the excitation signal, relaxed LTP lag control, stronger limitation of the LTP gain, adjusted adaptation of the Gain-Countour-Smoothing algorithm and modified adaptation of the Anti-Sparseness Procedure.
6.4 Substitution and muting of lost SID frames
In the speech decoder a single frame classified as SID_BAD shall be substituted by the last valid SID frame information and the procedure for valid SID frames be applied. If the time between SID information updates (updates are specified by SID_UPDATE arrivals and occasionally by SID_FIRST arrivals see 06.92) is greater than one second this shall lead to attenuation.