C8 Experiments 2a, 2b & 2c: No degradation of Speech and no Undesirable Effects in Residual Noise in Conditions with Background Noise (ACR)

06.773GPPMinimum Performance Requirements for Noise Suppresser Application to the AMR Speech EncoderTS

C8.1 Introduction

These ACR experiments are designed to test the requirement “No degradation of Speech and no Undesirable Effects in Residual Noise” in the Minimum Performance Requirements for Noise Suppresser Application to the AMR Speech Encoder, [1],. These ACR experiments will be run for three types of acoustic background noise.

C8.2 Test Factors and Conditions

The ACR test will be run for the following three types of acoustic background noise:

  • A car noise that is stationary both in level and in spectrum.
  • A street noise that is non-stationary in level but fairly stationary in spectrum.
  • A babble noise that is fairly stationary in level but non-stationary in spectrum.

This results in a total of three ACR experiments with the different noise types in separate experiments. Within each experiment, a low, a medium and a high SNR level will be tested. The values for the low SNR are SNR_C = 6 dB for the car noise, SNR_S = 9 dB for the street noise, and SNR_B = 9 dB for the babble noise. The higher SNR will be equal to SNR + 6 dB and SNR + 12 dB for all three noise types. The noise samples will have been recorded in scenarios representative of the respective low SNR value for each noise type (i.e. SNR = 6 or 9 dB).

All three experiments are run at AMR bit rate 12.2 kbit/s and 5.9 kbit/s.

The following table shows the testing factors to be used in these experiments. A full list of test conditions is given in Section 8.12.

Main Codec Conditions

#

Notes

Noise Suppresser Algorithms

1

Codec

1

AMR

Codec Modes

2

12.2 kbps rate, 5.9 kbps rate

BERs

0

Clear channel, no transmission errors

Input level

3

nominal (high, low): -26dB (-16 dB, -36 dB) relative to OVL

Acoustic Background Noise

3

Static Car @ 6dB, 12dB, 18dB

Street @ 9dB, 15dB, 21dB

Babble @ 9dB, 15dB, 21dB

Input Characteristic

1

GSM Filtered

VAD/CNG/DTX

2

ON only at the nominal level, medium SNR values, zero value of Ideal NS
OFF for other conditions

One VAD/CNG/DTX will be used ; either VAD Option 1 or 2, depending on the implementers choice

Codec references

#

Notes

All Experiments

1

AMR wo/ NS

Other references

#

Notes

Direct

Nominal level, GSM Filtered

MNRU, Exp 2a, 2b, 2c

5

Nominal level, with background noise, GSM Filtered, Q= 6, 12, 18, 24, 30dB

Ideal Noise Suppression

6

3 levels for each SNR

Common Conditions

#

Notes

GSM Channel

0

NO channel model

Number of talkers

4

2 male + 2 female

Number of speech samples

28

6/ talker for the main test + 1/ talker for the Practice session

Listening Level

1

-15dBPa (79dB SPL) at ERP

Listeners

24

Naive Listeners

Randomizations

6

6 groups of 4 listeners

Rating Scale

1

Modified ACR Instructions

Replications

1

Original Presentation Only

Table 8.2.1: Factors and conditions for Experiments 2a, 2b, 2c

C8.3 Preliminary Conditions

The following 16 preliminary test conditions are recommended.

Cond.

Presentation order

SNR value

Ideal NS (dB)

Codec

Talker and Sample Number

P1

5

SNR

Direct

M1S07

P2

1

SNR

MNRU-12

M2S07

P3

3

SNR

AMR@12.2

M1S07

P4

7

SNR

7

AMR@12.2

M2S07

P5

6

SNR+6

7

AMR@12.2

F1S07

P6

2

SNR+12

7

AMR@12.2

F2S07

P7

4

SNR

AMR@5.9

F1S07

P8

8

SNR+12

AMR@5.9

F2S07

P9

14

SNR

Direct

F1S07

P10

10

SNR

MNRU-12

F2S07

P11

12

SNR

AMR@12.2

F1S07

P12

16

SNR

7

AMR@12.2

F2S07

P13

13

SNR+6

7

AMR@12.2

M1S07

P14

9

SNR+12

7

AMR@12.2

M2S07

P15

11

SNR

AMR@5.9

M1S07

P16

15

SNR+12

AMR@5.9

M2S07

Table 8.3.1: List of preliminary conditions

C8.4 Speech Material

The speech material should be as defined in Section 6.4 – Long Sentence Quads, with each sample containing 4 sentences. For each test condition there are:

6 samples / talker, each sample 16sec long w/ 4 sentences

24 unique sentences / talker

For the practice conditions there are:

1 sample / talker

4 unique sentences / talker

To reduce any speech material effect, each talker sample must be unique. For these experiments, the unique samples are not balanced across all condition, candidates and subject groups. The same sample numbers for each talker are used for common conditions within a subject group and changed across subject groups. For a given language, the same speech material must be used for the three experiments 2a, 2b and 2c.

Speech samples numbered from 01 to 06 should be used for the test conditions; speech samples numbered as 07 should be used for the Practice session.

The noise material and its mix with the speech material should be as defined in Section 6.10 and Section 8.2.

C8.5 Experimental Design

The design is based on a restricted randomization philosophy using 6 different randomizations, each one covered by a group of 4 of the 24 subjects. This means that up to 4 subjects can perform the experiment simultaneously.

Each subject will hear all of the conditions four times, once with speech from each of the four talkers. Over the experiment as a whole, each of the conditions will be paired with six different samples from each of the four talkers. Each of the six groups of subjects will hear different combinations of source material and condition.

C8.6 Processing

Every condition has to be processed for each of the six stimuli of each of the four primary talkers. The actual samples used for each condition by each subject group are presented in Section 8.12 Test Conditions.

C8.7 Randomizations

Separate randomizations for each of the six subject groups shall be provided to reduce order effects and to minimize differences between the laboratories. There shall be six randomizations for the sub-experiments, one for each subject group. The same randomizations will be used for the three experiments (2a, 2b and 2c). Each one will therefore be used by four of the 24 subjects. Each randomization shall be balanced across 4 blocks of 36 stimuli to eliminate long sequences of similar conditions or identical talkers. The sequences shall provide for alternating male-female talkers.

C8.8 Duration of the ACR Experiments 2a, 2b, and 2c

Each stimuli is 16 s speech sample + 5 s voting time or 21 seconds. For each of the three experiments there are 16 preliminary conditions x 21 seconds or 5.6 minutes for an introductory block. The test consists of 36 conditions x 4 talkers x 21 seconds or 50.4 minutes, presented as three 16.8 minute blocks of 36 stimuli for 56 minutes testing time / subject group. The 6 groups of 4 subjects require 4 hours and 24 minutes total testing time

To reduce the effects of subject fatigue, the three blocks should be separated by short comfort breaks.

Note that the above calculations do not include the time needed to give the subjects their instructions, or for comfort breaks.

C8.9 Votes Per Condition

In each of the three experiments, every condition will have 24 subjects vote on one stimulus from each of four talkers, giving:

(24 subjects x 4 talkers) = 96 votes per condition

From past experience of ACR tests, this is the minimum number of votes per condition needed to give enough statistical certainty to differentiate the performance of one candidate process from another candidate process over the conditions and against the references.

C8.10 Test Procedure

Factors important for the experimental environment are specified in section 6.5 and 6.6. As specified in section 9.8, comfort breaks should be provided to reduce the effects of subject fatigue.

C8.11 Opinion Scale

The question asked of the subject is a modification of the ACR Listening Quality Scale. The specific wording is designed to evaluate both the level of distortion of the speech and the presence of artifacts in the residual background noise signal. The subjects will listen to each sample and after it has completed they will be asked to give their opinion.

Annex A contains an example of the instructions for the subjects in English. The instructions in Annex A contain a modified version of the ACR instructions. They are aimed at focusing the subjects to rate artifacts introduced by the NS device. The test administrator should have the freedom to provide guidance to the subjects to reinforce this point, provided that such instructions are consistent across all 24 subjects. This is particularly important for tests not performed in English. Any additional instructions given to the subjects should be reported as an integral part of test reports.

C8.12 Test Conditions for Experiments 2a, 2b and 2c

Cond.

Input level

SNR value

Ideal NS (dB)

VAD/DTX

Codec

Speech sample number

(6 sequences)

1

nominal

SNR

N/A

Direct

4 5 6 1 2 3

2

nominal

SNR

N/A

MNRU-30

4 5 6 1 2 3

3

nominal

SNR

N/A

MNRU-24

4 5 6 1 2 3

4

nominal

SNR

N/A

MNRU-18

4 5 6 1 2 3

5

nominal

SNR

N/A

MNRU-12

4 5 6 1 2 3

6

nominal

SNR

N/A

MNRU-6

4 5 6 1 2 3

7

nominal

SNR

off

AMR@12.2

1 2 3 4 5 6

8

nominal

SNR

4

off

AMR@12.2

1 2 3 4 5 6

9

nominal

SNR

7

off

AMR@12.2

1 2 3 4 5 6

10

nominal

SNR

off

AMR@5.9

1 2 3 4 5 6

11

high

SNR

off

AMR@12.2

1 2 3 4 5 6

12

high

SNR

off

AMR@5.9

1 2 3 4 5 6

13

nominal

SNR+6

off

AMR@12.2

2 3 4 5 6 1

14

nominal

SNR+6

4

off

AMR@12.2

2 3 4 5 6 1

15

nominal

SNR+6

7

off

AMR@12.2

2 3 4 5 6 1

16

nominal

SNR+6

off

AMR@5.9

2 3 4 5 6 1

17

nominal

SNR+6

on

AMR@12.2

2 3 4 5 6 1

18

nominal

SNR+6

on

AMR@5.9

2 3 4 5 6 1

19

low

SNR+6

off

AMR@12.2

2 3 4 5 6 1

20

low

SNR+6

off

AMR@5.9

2 3 4 5 6 1

21

nominal

SNR+12

off

AMR@12.2

3 4 5 6 1 2

22

nominal

SNR+12

4

off

AMR@12.2

3 4 5 6 1 2

23

nominal

SNR+12

7

off

AMR@12.2

3 4 5 6 1 2

24

nominal

SNR+12

off

AMR@5.9

3 4 5 6 1 2

25

nominal

SNR

off

AMR/NS@12.2

1 2 3 4 5 6

26

nominal

SNR

off

AMR/NS@5.9

1 2 3 4 5 6

27

nominal

SNR+6

off

AMR/NS@12.2

2 3 4 5 6 1

28

nominal

SNR+6

off

AMR/NS@5.9

2 3 4 5 6 1

29

nominal

SNR+12

off

AMR/NS@12.2

3 4 5 6 1 2

30

nominal

SNR+12

off

AMR/NS@5.9

3 4 5 6 1 2

31

nominal

SNR+6

on

AMR/NS@12.2

2 3 4 5 6 1

32

nominal

SNR+6

on

AMR/NS@5.9

2 3 4 5 6 1

33

low

SNR+6

off

AMR/NS@12.2

2 3 4 5 6 1

34

low

SNR+6

off

AMR/NS@5.9

2 3 4 5 6 1

35

high

SNR

off

AMR/NS@12.2

1 2 3 4 5 6

36

high

SNR

off

AMR/NS@5.9

1 2 3 4 5 6

Note:

Experiment 2a: Car noise with SNR = SNR_C = 6 dB,

Experiment 2b: Street noise with SNR = SNR_S = 9 dB

Experiment 2c: Babble noise with SNR = SNR_B = 9 dB

C8.13 Statistical Analysis

The statistics to be reported from this ACR test are the averaged MOS () scores and the standard deviations () for all the conditions.

Additionally, the requirement in [1, Section 6.1.3] should be checked using a hypothesis test for the conditions 25-36 if the mean MOS score is greater or equal to the MOS score for the corresponding equivalent (all being equal except NS activated) reference condition for AMR without NS within a 95 % confidence.

The hypothesis test should be performed using a 2-tailed T-test. The NS algorithm has failed the requirement if, for any of test condition,

where

and the subscripts and denotes the test condition and corresponding reference condition, respectively, is the number of votes, and is the inverse of the Student’s t-distribution with degrees of freedom and probability 0.05.