PDA

View Full Version : Question about normal distributions. **No poker content**

10-04-2005, 12:44 PM
Say we are given 2 well-defined (very large N) probabilty density functions: P1 and P2, with means mu1 &amp; mu2 and SDs sigma1 &amp; sigma2.

What is the probability that a randomly selected sample from P1 will be greater than a random sample from P2??

How would the confidence in our answer change if we knew the distributions were normal but our sample size was much smaller?

Any help is appreciated. Thanks

LetYouDown
10-04-2005, 01:02 PM
1.) Open textbook
2.) Read
3.) Don't cheat on your homework

/images/graemlins/grin.gif

Luzion
10-04-2005, 01:35 PM
[ QUOTE ]
Say we are given 2 well-defined (very large N) probabilty density functions: P1 and P2, with means mu1 &amp; mu2 and SDs sigma1 &amp; sigma2.

What is the probability that a randomly selected sample from P1 will be greater than a random sample from P2??

How would the confidence in our answer change if we knew the distributions were normal but our sample size was much smaller?

Any help is appreciated. Thanks

[/ QUOTE ]

If they are very large N, then you can use normal approximation to figure this out wher

Pr(P1 &gt; P2) = 1 - z[ (P1 - mu2)/sigma2 ]

Central limit theorem applies when the sampling is fairly large, so you wouldnt be confident if your sample size was small.

10-04-2005, 01:54 PM
Thanks for your replies.

I'm not a student, nor do I have a textbook (which I should). If I did have a text, what chapter or topic would I look up to find more info.

I just scoured wikipedia and mathworld looking under approximation and 'normal approximation' plus all kinds of probability related stuff... but nothing that looks like it's answering this type of question.

It must be related to how much the area under the two curves overlap, but I cant figure out how and its been driving me nuts.

Please point me in the the right direction /images/graemlins/confused.gif

Luzion
10-04-2005, 02:01 PM
If you are reading a book, look for the chapters on normal distribution. Usually the information on Central Limit Theorem follows on normal distribution.

If you are looking on wikipedia, or Mathworld, you can try looking for normal distribution, Central Limit Theorem or normal approximation, and z-score. I only know how to calculate these kinds of things using z-scores. Sorry.

pzhon
10-04-2005, 05:23 PM
[ QUOTE ]

What is the probability that a randomly selected sample from P1 will be greater than a random sample from P2??

[/ QUOTE ]
If the sample is small, it depends on the distributions. If the sample is large, you can use a normal approximation to each average.

The linear combination of two independent normally distributed random variables has a normal distribution. In particular, the difference is normally distributed. You want to know the probability that the difference is greater than 0.

If X and Y are independent and have standard deviations x and y, then the linear combination aX+bY has standard deviation sqrt((ax)^2+(by)^2).

So, as an example, suppose P1 has mean 1 and standard deviation 15. Suppose P2 has mean 0 and standard deviation 20. Supppose you have 100 samples from each with averages A1 and A2. A1 has mean 1 and standard deviation 1.5. A2 has mean 0 and standard deviation 2. A1-A2 has mean 1-0 and standard deviation sqrt((1.5)^2+(2)^2)=sqrt(6.25)=2.5. That A1-A2=0 would be 1/2.5 = 0.4 standard deviations below the mean.

Luzion
10-04-2005, 06:47 PM
[ QUOTE ]
[ QUOTE ]

What is the probability that a randomly selected sample from P1 will be greater than a random sample from P2??

[/ QUOTE ]
If the sample is small, it depends on the distributions. If the sample is large, you can use a normal approximation to each average.

The linear combination of two independent normally distributed random variables has a normal distribution. In particular, the difference is normally distributed. You want to know the probability that the difference is greater than 0.

If X and Y are independent and have standard deviations x and y, then the linear combination aX+bY has standard deviation sqrt((ax)^2+(by)^2).

So, as an example, suppose P1 has mean 1 and standard deviation 15. Suppose P2 has mean 0 and standard deviation 20. Supppose you have 100 samples from each with averages A1 and A2. A1 has mean 1 and standard deviation 1.5. A2 has mean 0 and standard deviation 2. A1-A2 has mean 1-0 and standard deviation sqrt((1.5)^2+(2)^2)=sqrt(6.25)=2.5. That A1-A2=0 would be 1/2.5 = 0.4 standard deviations below the mean.

[/ QUOTE ]

Good solution. I didnt think mine out well. Just a small correction for yours though I think. You are finding the probability that A1 &gt; A2. Therefore A1 - A2 &gt; 0. You are finding Pr(A1 - A2 &gt; 0).. So you would set it up as 1 - z[ (0-1)/2.5 ] instead... So the solution wouldnt be 0.4SD below the mean, but rather 0.4SD above the mean, which is 0.65 probability A1 is greater then A2.

AaronBrown
10-04-2005, 07:28 PM
Your question is not quite clear.

I think you mean you have two large POPULATIONS, rather than distributions, otherwise the large N does not make sense. You know the means and standard deviations of the populations. Or you might mean that you have two distributions, but the means and standard deviations are reliably estimated from large prior samples.

I then think you are selecting one item from each population independently and want to know the probability that that the item from the first distribution will be larger than the item from the second. But you could mean that you are selecting more than one item from each, in which case I assume you would compare the means of the samples from the two distributions.

The independence or not of the samples is critical to this problem.

If we select one item from each population independently, then we know X1 - X2 has mean M1 - M2 and standard deviation [S1^2 + S2^2]^0.5. This has nothing to do with Normality, it will be true for any two populations. Normality allows us to put a precise number on the probability that X1 - X2 &gt; 0. However, even without that assumption we can set limits on the value.