C9. Experiments 3a & 3b: Performances in Background Noise Conditions (Mod-CCR)

06.773GPPMinimum Performance Requirements for Noise Suppresser Application to the AMR Speech EncoderTS

C9.1 Introduction

These experiments are designed to test Requirements in the associated Section in the Recommended Minimum Performance Requirements Specification ([1], TS GSM 06.77). Specifically, the AMR with noise suppression should, in a certain number of conditions, be preferred to the AMR without noise suppression in a background noise environment and should provide a reasonable level of SNR improvement. Experiment 3a examines the performance of the noise suppression with the half-rate codec, while Experiment 3b examines the noise suppression with the full rate codec. Both experiments will use the Modified Comparison Category Rating (Mod-CCR, Note 1) method with a seven-point rating scale. Listeners will judge the relative quality of samples processed through the codec with noise suppression, compared to those without the noise suppression applied (example instructions for listeners are given in Annex A.3). The samples will have background noise of various types and levels mixed into the source speech before processing through the codec.

The factors for each of the four sub-experiments are presented in Table 9.1.

Factor

Expt 3a

Expt 3b

codec

AMR 5.9 kb/s

AMR 12.2 kb/s

noise types

car (6 and 15 dB)
street (9 and 18 dB)
babble (9 and 18 dB)

car (6 and 15 dB)
street (9 and 18 dB)
babble (9 and 18 dB)

Table 9.1: Factors for Experiments 3a and 3b

Note 1:

The standard Comparison Category Rating method (CCR) which is described in Annex E of Rec. P.800 is similar to the Degradation Category Rating method (DCR, Annex D). In Annex E, it is explicitly said : "Listeners are presented with a pair of speech samples on each trial. In the DCR procedure, a reference (unprocessed) sample is presented first, followed by the same speech sample, which has been processed by some technique. In the CCR procedure, the order of the processed and unprocessed samples is chosen at random for each trial. Listeners use the seven-point CCR scale to judge the quality of the second sample relative to that of the first. The DCR and the CCR methods are particularly useful for assessing the performance of telecommunications systems when the input has been corrupted by background noise. However, an advantage of the CCR method over the DCR procedure is the possibility to assess speech processing that either degrades or improves the quality of the speech.

Here we are using a different application of the standard CCR method. The modified CCR method uses processed reference samples (but without noise suppression applied) whereas the standard CCR method uses unprocessed reference samples.

C9.2 Test Factors and Conditions

Three types of background noise will be used, at two different SNRs:

  • A car noise that is stationary both in level and in spectrum.
  • A street noise that is non-stationary in level, but fairly stationary in spectrum.
  • A babble noise that is fairly stationary in level, but non-stationary in spectrum.

The noise samples will be those utilised during the AMR Noise Suppression Selection Phase.

The codec is held constant for each experiment, with two SNR classes (‘SNR’ and ‘SNR+9dB’) per experiment. All of the noise types are used in each experiment. The noise samples will have been recorded in scenarios representative of the respective SNR value for each noise.

The factors and conditions to be used in Experiments 3a and 3b are presented in Table 9.2. The expanded set of test conditions is given in Section 9.12.

Main Codec Conditions

#

Notes

Noise Suppresser Candidates

1

Codec

1

AMR

Codec Modes (HR/FR)

HR

FR

5.9 kbit/s rate for Experiment 3a

12.2 kbps rate for Experiment 3b

BERs

0

Clear channel, no transmission errors

Input level

1

nominal: -26dB relative to OVL

Acoustic Background Noise

3

car, street, and babble noise

Background noise SNRs

2

low, high for each (see Table 9.1)

Input Characteristic

1

GSM transmit filtered

Codec references

#

Notes

All Experiments

1

the same AMR rate w/o NS

Other references

#

Notes

Direct

nominal level, GSM transmit filtered

MNRU, Exp 3a and 3b

nominal level, GSM transmit filtered, Q= 12, Q= 4

Ideal noise suppression simulation

Common Conditions

#

Notes

GSM Channel

0

NO channel model

Number of talkers

4

2 male + 2 female primary talkers

Number of speech samples

28

7 Sentence-pairs/primary talker (6 for Test, 1 for Practice)

Listening Level

1

-15dBPa (79dB SPL) at ERP

Listeners

24

Naive Listeners

Randomizations

6

6 groups of 4 listeners

Rating Scale

1

CCR Instructions

Replications

1

Original Presentation Only

Table 9.2: Factors and conditions for Experiments 3a and 3b

C9.3 Preliminary Conditions

The following 16 preliminary test conditions are recommended, for presentation, before proceeding to the test samples. The samples shall be presented in the random order given in Table 9.3

Cond.

Presentation

Noise

SNR

Reference

Processed

Speech Sample

order

(dB)

Ideal NS

Codec

Number

P1

9

Car

6

Direct

Direct

M1S07

P2

5

Car

15

AMR@x

AMR@x

F1S07

P3

12

Car

6

MNRU-12

MNRU-16

M2S07

P4

13

Car

15

MNRU-12

Direct

F2S07

P5

2

Street

9

AMR@x

AMR@x

M1S07

P6

4

Street

18

MNRU-12

MNRU-16

F1S07

P7

8

Street

18

MNRU-12

Direct

M2S07

P8

16

Babble

9

AMR@x

AMR@x

F2S07

P9

7

Babble

9

MNRU-12

MNRU-16

M1S07

P10

1

Babble

18

MNRU-12

Direct

F1S07

P11

11

Car

6

AMR@x

4

AMR@x

M2S07

P12

3

Car

15

AMR@x

10

AMR@x

F2S07

P13

15

Street

18

AMR@x

4

AMR@x

M1S07

P14

6

Street

9

AMR@x

10

AMR@x

F1S07

P15

10

Babble

9

AMR@x

4

AMR@x

M2S07

P16

14

Babble

18

AMR@x

10

AMR@x

F2S07

Notes:

  • The bit rate for the AMR processing for the preliminary samples shall be the same as that used for the test samples, 5.9 kbit/s for Experiment 3a, 12.2 kbit/s for Experiment 3b.

Table 9.3: List of preliminary conditions

C9.4 Speech Material

The source speech material shall be as defined in Section 6.3 and will consist of the material used during the AMR Noise Suppression Selection phase: Each sample consists of two sentences. Only primary talkers are needed. For the four talkers, the following source material should be prepared:

Seven samples for each talker, six for the test samples and one for the preliminaries,

Each sample to be eight seconds long,

Unique sentences-pairs in each sample (i.e., no repeated across the talkers)

To reduce any speech material effect, the samples for each talker must be unique. For these experiments, these unique stimuli are not balanced across all conditions, candidates and subject groups. The same sample numbers for each talker are used for common conditions within a subject group and changed across subject groups (these sample numbers are arbitrarily assigned to samples). For a given language, the same speech material must be used for the two experiments 3a and 3b. The noise material and its mix with the speech material should be as defined in Section 6.8 and Section 6.3.7 respectively.

C9.5 Experimental Design

The design is based on a restricted randomization philosophy using six different randomizations, each of which is used with a group of four of the 24 listeners. This means that up to four subjects can perform the experiment simultaneously.

Each listener will hear all of the conditions four times, once with speech from each of the four talkers. Over the experiment as a whole, each of the conditions will be paired with six different samples from each of the four talkers. Each of the six groups of subjects will hear different combinations of source material and condition.

C9.6 Processing

Every condition is processed with each of the six samples of each of the four primary talkers. The actual samples to be used for each condition, within with each subject group, are presented in Section 9.12, Test Conditions.

C9.7 Randomizations

The test shall be completed using the randomizations provided by the experimenter. There shall be six randomizations for the sub-experiments, one for each subject group. The same randomizations shall be used for the two experiments (3a and 3b). Each one will therefore be used by four of the 24 subjects. Each randomization is balanced across four blocks of 48 stimuli to eliminate long sequences of similar conditions or identical talkers. The sequences shall provide for alternating male-female talkers. Use of these randomizations will allow presentation order to be used as a factor in a global analysis, should that be necessary. The randomization shall be constrained to a randomized block design, which controls practice and fatigue effects that may occur over the course of a test session.

C9.8 Duration of the CCR Experiments 3a and 3b

Each trial consists of an eight-second reference sample + an eight-second test sample + five second voting time, totaling 21 seconds. For each of the four experiments there are 16 preliminary conditions x 21 seconds or 5.6 minutes for an introductory block. Each presentation set within an experiment consists of 52 conditions (A/B+B/A) x 4 talkers x 21 seconds or 70 minutes, presented as eight 8.75 minute blocks of 25 stimuli for 75.6 minutes testing time / subject group / experiment. The total testing time for each experiment will be 7 hours and 34 minutes, if four listeners are tested at one time.

To reduce the effects of subject fatigue, each 8.75 minute block should be separated by short comfort breaks.

Note that the above calculations do not include the time needed to give the subjects their instructions, or time taken for comfort breaks.

C9.9 Votes Per Condition

In each of the three experiments, 24 listeners rate every condition with four talkers in each of two presentation orders (A/B and B/A), giving:

(24 subjects x 4 talkers x 2 presentations) = 192 votes per condition

From past experience with CCR tests, this is the minimum number of votes per condition needed to give enough statistical certainty to differentiate the performance of one candidate process from another candidate process over the conditions and against the references.

C9.10 Test Procedure

Factors important for the experimental environment are specified in Sections 6.4, 6.5, and 6.6. As specified in Section 9.8, comfort breaks should be provided to reduce the effects of subject fatigue.

C9.11 Opinion Scale

The question asked of the subject is a based on of the CCR Listening Quality Comparison Scale. The listening subjects will judge the quality of the second sample with regard to quality of the first sample. The subjects will listen to each pair of samples and after these have been played, they will be asked to give their comparative opinion. Annex A contains an example of the instructions for the subjects in English. Changes to the instructions may be needed to specify the method of data collection being used (button-press, paper & pencil, etc.).

C9.12 Test Conditions for Experiments 3a and 3b

Cond.

Noise

SNR (dB)

Reference

Processed

Speech sample

Ideal NS

Codec

number

1

Car

6

AMR@x

AMR@x

4 5 6 1 2 3

2

Street

9

AMR@x

AMR@x

4 5 6 1 2 3

3

Babble

9

AMR@x

AMR@x

4 5 6 1 2 3

4

Car

6

MNRU-16

MNRU-12

4 – – 1 – –

5

Car

6

Direct

MNRU-12

4 – – 1 – –

4′

Street

9

MNRU-16

MNRU-12

– 5 – – 2 –

5′

Street

9

Direct

MNRU-12

– 5 – – 2 –

4”

Babble

9

MNRU-16

MNRU-12

– – 6 – – 3

5”

Babble

9

Direct

MNRU-12

– – 6 – – 3

6

Car

6

AMR@x

3

AMR@x

1 2 3 4 5 6

7

Car

6

AMR@x

6

AMR@x

1 2 3 4 5 6

8

Car

6

AMR@x

9

AMR@x

1 2 3 4 5 6

9

Street

9

AMR@x

3

AMR@x

2 3 4 5 6 1

10

Street

9

AMR@x

6

AMR@x

2 3 4 5 6 1

11

Street

9

AMR@x

9

AMR@x

2 3 4 5 6 1

12

Babble

9

AMR@x

3

AMR@x

3 4 5 6 1 2

13

Babble

9

AMR@x

6

AMR@x

3 4 5 6 1 2

14

Babble

9

AMR@x

9

AMR@x

3 4 5 6 1 2

15

Car

6

AMR@x

AMR/NS1@x

1 2 3 4 5 6

16

Street

9

AMR@x

AMR/NS1@x

2 3 4 5 6 1

17

Babble

9

AMR@x

AMR/NS1@x

3 4 5 6 1 2

18

Car

15

AMR@x

3

AMR@x

1 2 3 4 5 6

19

Car

15

AMR@x

6

AMR@x

1 2 3 4 5 6

20

Street

18

AMR@x

3

AMR@x

2 3 4 5 6 1

21

Street

18

AMR@x

6

AMR@x

2 3 4 5 6 1

22

Babble

18

AMR@x

3

AMR@x

3 4 5 6 1 2

23

Babble

18

AMR@x

6

AMR@x

3 4 5 6 1 2

24

Car

15

AMR@x

AMR/NS1@x

1 2 3 4 5 6

25

Street

18

AMR@x

AMR/NS1@x

2 3 4 5 6 1

26

Babble

18

AMR@x

AMR/NS1@x

3 4 5 6 1 2

27-52

Reversed order of the reference and processed speech samples in cond. 1-26

Notes:

  • AMR@x denotes AMR at bit rate x, AMR/Ns1@x denotes the NS candidate at bit rate x; 5.9 kbit/s for Experiment 3a, 12.2 kbit/s for Experiment 3b
  • SNR(dB) denotes SNR for noise
  • 4 talkers are used for all conditions: 2 male and 2 female
  • 6 speech samples (8 s) are used for each talker

– ‘multiple’ conditions “4s” and “5s” (e.g. 4 and 4′) are only presented to a subset of listeners (e.g. to the first and the fourth groups of randomisation) ,

C9.13 Statistical Analysis

The statistics to be reported from this CCR test are the averaged CMOS () scores and the standard deviations () for all the conditions.

Additionally, the requirement in [1, Section 6.1.4] should be checked using hypothesis tests for the conditions 15-17 and 24-26 if the mean CMOS score is greater than zero (the NS performance is preferred) and greater or equal to zero (the NS performance is equivalent) within a 95 % confidence.

The hypothesis test should be performed using a 1-tailed T-test. The NS algorithm has failed the requirement at level “preferred” for any of test condition if

where

and the subscripts denotes the test condition, is the number of votes, and is the inverse of the Student’s t-distribution with degrees of freedom and probability 0.05.

Similarly, the NS algorithm has failed the requirement at level “equal” if