C7. Experiment 1: Degradation in Clean Speech (Pair Comparison Test)

06.773GPPMinimum Performance Requirements for Noise Suppresser Application to the AMR Speech EncoderTS

C7.1 Introduction

This PC (Paired-Comparison) experiment was prepared to test the ‘No degradation in clean speech’ requirement in the Recommended Minimum Performance Requirements specification ([1], TS GSM 06.77), i.e.. This PC experiment will be run for the whole set of bit rates of the base vocoder, in single and tandem connection.

The test methodology is direct, paired, forced choice comparison (i.e. A versus B test method with forced choice) . The question that we are trying to answer with this test is not “What is the rank order of several coders?” but rather “Does the quality of coder with noise suppression (+NS) meet or exceed the quality of the coder without NS for a given condition?” The direct comparison A/B test methodology can answer this question by considering the proportion (or percent) of the measures where the candidate was preferred over the standard. Each individual judgement is a binary decision. A rank order approach could be taken as noted in the Handbook of Telephonometry [3] regarding Paired Comparisons but notes: "In the scaling modulus is included the common standard deviation, which is, however, unknown and so does not permit calculating confidence limits for the scale positions obtained."

For the A/B experiment proposed here, with 24 subjects each making two independent measures (A/B and B/A) of the preference of the candidate coder over the standard coder for four talkers (two male and two female) each condition and with one repeat , the effective N is 384. In order to accommodate the repeat measure, single sentence samples will be used. This provides the additional benefit of directly adjacent A/B comparisons during presentation. The repeat measure will be made using a unique second sentence.

C7.2. Test Factors and Conditions

The PC test will be run for the following basic vocoder conditions:

  • Bit Rates of 4.75 kbit/s, 5.15 kbit/s, 5.9 kbit/s, 6.7 kbit/s, 7.4 kbit/s, 7.95 kbit/s, 10.2 kbit/s and 12.2 bit/s.
  • Single codec.

This results in a single PC experiment with clean source speech and no channel impairments. The speech material used in these experiments are 4s samples (single sentence).

The following table (Table 7.1) shows the testing factors to be used in this experiment. Due to the limited number of conditions tested within this experiment, it is possible to design a more balanced test structure and introduce some dummy conditions where the perceived difference in quality within the pairs of stimuli should be obvious for the subjects. A list of test conditions is given in Table 7.3.

Main Codec Conditions

#

Notes

Noise Suppresser Candidate

1

Codec

1

AMR

Codec Modes (FR/HR)

HR

FR

All 8 AMR modes

BERs

0

Clear channel, no transmission errors

Input level

1

nominal: -26dB relative to OVL

Acoustic Background Noise

0

None

Tandeming

0

No tandeming condition

Input Characteristic

1

GSM Filtered

Codec references

#

Notes

Test vocoders

1

AMR with NS

Reference vocoder

8

AMR at 12.2, 10.2, 7.95, 7.4, 6.7, 5.9, 5.15 & 4.75

Other references

#

Notes

Direct

Nominal level, GSM Filtered

MNRU

2

Q = 5 dB & 20 dB, other Q values in preliminaries

Ideal Noise Suppression

0

None

Common Conditions

#

Notes

GSM Channel

0

NO channel model

Number of talkers

4

2 male + 2 female

Number of speech samples

52

12/talker + 1 practice/talker

Sentences/sample

1

Single sentence stimuli

Listening Level

1

-15dBPa (79dB SPL) at ERP

Listeners

24

Naive Listeners

Randomizations

6

6 groups of 4 listeners

Rating Scale

1

PC Instructions

Replications

2

Original Presentation + repeat w/ 2nd sentence

Table 7.1: Factors and conditions for Experiment 1

C7.3 Preliminary Conditions

The following 16 preliminary test conditions are recommended.

Cond.

Presentation order

Reference Codec

Trans-codings

Processed Codec

Trans-codings

Talker and Sample Number

P1

5

Direct

MNRU-20

F1S13

P2

1

MNRU-18

MNRU-22

M1S13

P3

3

MNRU-19

MNRU-21

F2S13

P4

7

AMR-12.2

1

AMR-12.2

1

M2S13

P5

6

AMR-12.2

1

AMR-5.9

1

F1S13

P6

2

AMR-5.9

1

AMR-5.9

1

M1S13

P7

4

AMR-4.75

1

AMR-7.95

1

F2S13

P8

8

MNRU-5

MNRU-20

M2S13

P9

14

MNRU-20

Direct

F1S13

P10

10

MNRU-22

MNRU-18

M1S13

P11

12

MNRU-21

MNRU-19

F2S13

P12

16

AMR-12.2

1

AMR-12.2

1

M2S13

P13

13

AMR-5.9

1

AMR-12.2

1

F1S13

P14

9

AMR-5.9

1

AMR-5.9

1

M1S13

P15

11

AMR-7.95

1

AMR-4.75

1

F2S13

P16

15

MNRU-20

MNRU-5

M2S13

Table 7.2: List of preliminary conditions for Experiment 1

C7.4 Speech Material

Single sentences. For the 4 talkers, 2 male and 2 female there are:

13 stimuli / talker, each stimuli 4sec long w/ 1 sentence

12 unique sentences / talker for test plus one for practice

To reduce the speech material effect, each talkers’ samples must be unique. For this experiment, the unique samples are not balanced across all condition, candidates and subject groups. The same sample numbers for each talker are used for common conditions within a subject group and changed across subject groups.

C7.5 Experimental Design

The design is based on a restricted randomization philosophy using 6 different randomizations, each one covered by a group of 4 of the 24 subjects. This means that up to 4 subjects can perform the experiment simultaneously.

Each subject will hear all of the conditions 16 times, four times with speech from each of the four talkers. Each of two stimuli for a talker will be presented in both the A/B and B/A order. Over the experiment as a whole, each of the conditions will be paired with twelve different samples from each of the four talkers. Each of the six groups of subjects will hear different combinations of source material and condition.

C7.6 Processing

Every condition has to be processed for each of the twelve stimuli of each of the four talkers. The actual samples used for each condition by each subject group are presented in Section 7.12 Test Conditions.

C7.7 Randomizations

Separate randomizations for each of the six subject groups shall be provided to reduce order effects and to minimize differences between the laboratories. There shall be six randomizations for the experiment, one for each subject group. Each one will therefore be used by four of the 24 subjects.

C7.8 Duration of the PC Experiment

Each stimuli is 4 sec reference + 4 sec speech sample + 4 s voting time or 12 seconds. For this experiment there are 16 preliminary conditions x 12 seconds or 3.2 minutes for an introductory block. The presentation set for the experiment consists of 40 conditions (A/B+B/A) x 2 repeats x 4 talkers x 12 seconds or 64 minutes. The experiment is presented as the 16 preliminary conditions followed by the test itself divided in several sessions, i.e. 67,2 minutes testing time / subject group. The 6 groups of 4 subjects require 7 hours and 30 minutes total testing time for the experiment (6 x 1h 15 env.)

To reduce the effects of subject fatigue, sessions should be separated by short comfort breaks.

Note that the above calculations do not include the time needed to give the subjects their instructions, or for comfort breaks.

C7.9 Votes Per Condition

Every condition will have 24 subjects vote on four stimulus from each of four talkers, giving:

(24 subjects x 4 talkers x 4 Presentations) = 384 votes per condition

From past experience of PC tests, this is the minimum number of votes per condition needed to give enough statistical certainty to differentiate the performance of one candidate process from another candidate process over the conditions and against the references.

C7.10 Test Procedure

Factors important for the experimental environment are specified in sections 6.4, 6.5, and 6.6. As specified in section 7.8, comfort breaks should be provided to reduce the effects of subject fatigue.

7.11 Opinion ScaleThe question asked of the subject is according to the Paired-Comparison binary scale. The specific wording is designed to evaluate the relative quality of the test sample in relation to the reference sample. In order to minimise presentation bias, the samples will be presented in both the A/B and B/A directions within the experiment. The subjects will listen to each pair of samples, and after presentation is completed, they will be asked to give their opinion. Annex A.1 contains an example of the instructions for the subjects in English.

C7.12. Statistical Analysis

The statistics to be reported for this pair-comparison experiment [4] are the proportion P of subjects preferring the test stimulus over the reference stimulus (as defined in Table 2) for a total of N votes per condition, the standard deviation s:

(Eq.1)

and the upper and lower confidence limits, as calculated by:

(Eq.2)

where is the standardized score for a normal distribution cutting off the lower proportion of cases.

Additionally, a hypothesis to test was whether the preference for the noise reduction-enabled AMR codec was statistically different from the ideal proportion =0.5, i.e. that the AMR with noise suppression is equally preferred to AMR without noise suppression (for quiet background). In other words,

The null hypothesis Ho is tested using a z test where:

(Eq.3)

Hence, the null hypothesis is rejected if

Or accepted if:

(Eq.4)

For a 95% confidence level, Equations 2 and 4 are reduced to (, N=384):

(Eq.5)

(Eq.6)

C7.13. Test Conditions for Experiment 1

Cond.

Reference Codec

Processed Codec

Trans-codings

Speech sample number

(6 sequences)

1

AMR@12.2

AMR@12.2

1

2 3 4 5 6 1

2

AMR@10.2

AMR@10.2

1

3 4 5 6 1 2

3

AMR@7.95

AMR@7.95

1

1 2 3 4 5 6

4

AMR@7.4

AMR@7.4

1

4 5 6 1 2 3

5

AMR@6.7

AMR@6.7

1

5 6 1 2 3 4

6

AMR@5.9

AMR@5.9

1

6 1 2 3 4 5

7

AMR@5.15

AMR@5.15

1

2 3 4 5 6 1

8

AMR@4.75

AMR@4.75

1

3 4 5 6 1 2

9

AMR@12.2

AMR@5.9

1

1 2 3 4 5 6

10

AMR@4.75

AMR@7.95

1

4 5 6 1 2 3

11

DIRECT

MNRU Q= 20 dB

1

5 6 1 2 3 4

12

MNRU Q= 5 dB

MNRU Q= 20 dB

1

6 1 2 3 4 5

13

AMR@12.2

AMR/NS@12.2

1

2 3 4 5 6 1

14

AMR@10.2

AMR/NS@10.2

1

3 4 5 6 1 2

15

AMR@7.95

AMR/NS@7.95

1

1 2 3 4 5 6

16

AMR@7.4

AMR/NS@7.4

1

4 5 6 1 2 3

17

AMR@6.7

AMR/NS@6.7

1

5 6 1 2 3 4

18

AMR@5.9

AMR/NS@5.9

1

6 1 2 3 4 5

19

AMR@5.15

AMR/NS@5.15

1

1 2 3 4 5 6

20

AMR@4.75

AMR/NS@4.75

1

3 4 5 6 1 2

21 – 40

Reversed order of the reference and processed speech samples in cond. 1-20

41 – 60

Repeat of conditions 1 – 20 with Speech Sample Number +6

61 – 80

Reversed order of the reference and processed speech samples in cond. 41 – 60

Notes:

  • 4 talkers are used for all conditions: 2 male and 2 female
  • 12 speech samples (4 s) are used for each talker
  • AMR@12.2 means AMR at 12.2 kbit/s
  • AMR/NS@12.2 means NS candidate x with AMR at 12.2 kbit/s

Table 7.3: Test conditions for Experiment 1