Excel – Random Numbers – and If statement

microsoft-excel-2007

In "A", rows 1-1000, I generate random numbers using RAND(). I want to pick numbers between say 1-100.

In "B" for 1000 rows I have =RAND()*(100-1)+1

In "C" for 1000 rows I use the numbers that I generated in "A", so =A1*(100-1)+1

The averages in B and C are the same, as expected.

I only want to use the values for 20% of the time, so for each of the 1000 rows:

For B, I use =IF(A1<0.2,B1,0)
For C, I use =IF(A1<0.2,C1,0)

If I do this, the averages for each of the two columns are vastly different.

Can anybody explain why?

Both approaches are using random numbers. The first approach uses two different random numbers (from column "A" and the one used in the "B" column calculation).

Best Answer

Columns A and B have the same uniform distribution, aside from the linear transformation, but are independant. B's probability density function (PDF) for A<0,2 is the same as the PDF for any other selection of A.

Columns A and C are not independant. In fact, they are completely deterministic. If you select rows where A<0,2, you are only selecting rows where C<20,8. The probability of finding a value of C greater than 20,8 within that selection has dropped to zero. Clearly, the distribution has changed.

For a more obvious example of this, consider rolling two dice (A and B) and their total (C). I'd expect to see an average of 3,5 in columns A and B, and an average of 7 in column C. If from this table I only select rows where die A landed on 1, I'd still see an average of 3,5 in B (independant), but merely 4,5 instead of 7 in column C (dependant).