5.1.3 Sample rate conversion to 12.8 kHz

26.4453GPPCodec for Enhanced Voice Services (EVS)Detailed algorithmic descriptionRelease 15TS

The linear predictive (LP) analysis, the long-term prediction (LTP), the VAD algorithm and signal are performed at the 12.8 kHz sampling rate. The HP-filtered input signal is therefore converted from the input sampling frequency to 12.8 kHz.

5.1.3.1 Conversion of 16, 32 and 48 kHz signals to 12.8 kHz

For 16, 32 and 48 kHz signals, the sampling conversion is performed by first up‑sampling the signal to 192 kHz, then filtering the output through a low-pass FIR filter that has the cut‑off frequency at 6.4 kHz. Then, the signal is down-sampled to 12.8 kHz. The filtering delay is 15 samples at 16 kHz sampling frequency which corresponds to 0.9375 ms.

The up-sampling is performed by inserting 11, 5 or 3 (for 16, 32 or 48 kHz, respectively) zero-valued samples between each 2 samples for each 20-ms frame of 320 samples (at 16 kHz sampling frequency)

(3)

where is the signal at 192 kHz sampling frequency and is the up-sampling factor equal to 12 for a 16 kHz input, 6 for a 32 kHz input and 4 for a 48 kHz input. Then, the signal is filtered through the LP filter and decimated by 15 by keeping one out of 15 samples. The filter is a 361-tap linear phase FIR filter having a cut-off frequency of 6.4 kHz in the 192 kHz up-sampled domain. The filtering and decimation can be done using the relation

(4)

where is the impulse response of . The operations in equations (3) and (4) can be implemented in one step by using only a part of the filter coefficients at a time with an initial phase related to the sampling instant n. That is

(5)

In case the encoder is externally forced to narrow-band processing of the input signal, the cut-off frequency of the LP filter is changed from 6.4 kHz to 4 kHz.

5.1.3.2 Conversion of 8 kHz signals to 12.8 kHz

For 8 to 12.8 kHz resampling a sharper resampling filter is beneficial. Double length low-pass FIR filter is used in this case. The doubling of the impulse response length is compensated by a low delay resampling method. The filter is a 241-tap linear phase FIR filter having a cut-off frequency of 3.9 kHz and is applied in the up-sampled domain which is 64 kHz. Direct FIR filtering with this filter would yield a delay of 120/64 = 1.875 ms. In order to reduce this delay to 0.9375 ms, future samples are determined at 8 kHz by adaptive linear prediction. The exact number of future samples is found based on the difference between the actual delay (1.875 ms) and the desired delay (0.9375 ms) at 8 kHz. Therefore future samples are predicted. These predicted samples are concatenated at the end of the current frame to form a support vector. Then, the sample rate conversion of is performed in a similar way as for the other sampling rates, i.e. is first up-sampled to 64 kHz, the output is filtered through the low-pass FIR filter and the resulting signal is down-sampled to 12.8 kHz. The final filtering delay is aligned with that of the other resampling configurations, i.e 12 samples at 12.8 kHz sampling frequency which corresponds to 0.9375 ms.

To determine the future samples, linear prediction coefficients of order 16 are computed in the pre-emphasized domain in the following way. The last Lss =120 samples of the input frame at 8 kHz are windowed by an asymmetrical analysis window winss_120:

(6)

and a first order autocorrelation analysis is made on the windowed signal . The pre-emphasis coefficient ss is obtained by

(7)

where rw(0) and rw(1) are the autocorrelation coefficients

(8)

The last 120 samples of the signal are pre-emphasized using the adaptive filter

(9)

to obtain the pre-emphasized signal of Lss =120 samples. Then is windowed by the asymmetrical analysis window winss_120 and a 16th order autocorrelation analysis is made on the windowed signal

(10)

These autocorrelation coefficients are lag-windowed by

(11)

where wlag8k(k) is defined as

(12)

Based on the autocorrelation coefficients rpwl(k), the linear prediction coefficients ass(k) are computed by the Levinson-Durbin algorithm. The future samples in the pre-emphasized domain are predicted by zero input filtering through the 1/Ass(z) synthesis filter

(13)

Finally, the concatenated signal is de-emphasized through the filter . Note that only the last 7 predicted samples need to be de-emphasized. These 7 de-emphasized samples are concatenated to (at positions n = 160,…,166) to form the support vector.

The up-sampling of is then performed by inserting 7 zero-valued samples between each 2 samples for each 20-ms frame of 160 samples (at 8 kHz sampling frequency) completed by 7 predicted future samples (167 in total)

(14)

where is the signal at 64 kHz sampling frequency. Then, the signal is filtered through the LP filter and decimated by 5 by keeping one out of 5 samples. The filtering and decimation can be done using the relation

(15)

where is the impulse response of and assures that the index of s64 is never higher than the highest available index for (which is 1335). Indeed, it corresponds to the delay of this filtering at 64 kHz. To reduce complexity, the operations in equations (14) and (15) can be implemented in one step by using only a part of the filter coefficients at a time with an initial phase related to the sampling instant n. This polyphase implementation of the resampling filter is applied on the concatenated support vector. That is

(16)

where is derived from the delay of this filtering at 8 kHz. It assures that the index of sHPC is never higher than the highest available index (which is 166).

5.1.3.3 Conversion of input signals to 16, 25.6 and 32 kHz

If ACELP core is selected for WB, SWB or FB signals at bitrates higher than 13.2 kbps (see subclause 5.1.16), its internal sampling rate is set to 16 kHz rather than 12.8 kHz. If the input signal is sampled at 8 kHz, there is no conversion needed because for NB signals, ACELP core is always operated at 12.8 kHz. If the input signal is sampled at 16 kHz, no conversion is needed either and the input signal is only delayed by 15 samples which corresponds to 0.9375 ms. This is to keep all pre-processed signals aligned regardless of the bitrate or bandwidth. Thus, the input signal is resampled to 16 kHz only if its sampling frequency is 32 or 48 kHz.

The resampling operation is done in the same way as for the case of 12.8 kHz (see subclause 5.1.3.1), i.e. by means of FIR filtering. The coefficients of the LP filter are different but the filtering delay is still the same, i.e. 0.9375 ms.

The resampled signal is denoted where n=0,..,319.

The input signal is converted to 25.6 kHz at 48 kbps and to 32 kHz at 96 or 128kbps but only for SWB and FB signals. The sampling conversion is again similar as in the case of 12.8 kHz with differences in LP filter coefficients. The resampled signals are denoted and , respectively.