Two Plus Two Older Archives  

Go Back   Two Plus Two Older Archives > General Gambling > Probability
FAQ Community Calendar Today's Posts Search

Reply
 
Thread Tools Display Modes
  #1  
Old 12-18-2005, 09:38 PM
naphinfitos naphinfitos is offline
Member
 
Join Date: Apr 2005
Location: bostonish
Posts: 61
Default question about sample variance

When calculating this, you divide by n-1, rather than n. why is this? is there any rational explanation or is it arbitrary. ty in advance.
Reply With Quote
  #2  
Old 12-18-2005, 09:48 PM
Guest
 
Posts: n/a
Default Re: question about sample variance

The simple answer is that n-1 "works" to make it unbiased. The MLE estimator divides by n, but is biased. When you take the expectation of the MLE estimator, you see that dividing by n-1 instead of n makes the estimator unbiased. And people tend to like unbiased estimators. So it's not arbitrary, but there's not some particularly deep reasoning either.
Reply With Quote
  #3  
Old 12-18-2005, 09:53 PM
naphinfitos naphinfitos is offline
Member
 
Join Date: Apr 2005
Location: bostonish
Posts: 61
Default Re: question about sample variance

thanks for the help. i was wondering why its unbiased though. my teacher refuses to tell us the reasoning for using n-1, and i was interested. ty.
Reply With Quote
  #4  
Old 12-18-2005, 11:29 PM
AaronBrown AaronBrown is offline
Senior Member
 
Join Date: May 2005
Location: New York
Posts: 505
Default Re: question about sample variance

It's pretty simple, but before I go through it I would add that if it makes a difference to a real decision, you need more data. It's the kind of mathematical subtlety that is used to torture students, nothing you should worry about for practical reasoning.

Suppose you have a sample from a population with mean M. The sample average happens to be A. To compute the variance, you begin by summing the squared deviations from the average.

(1) Sum from i = 1 to N of (Xi - A)^2

What you really want is the sum of the deviations from the true mean:

(2) Sum from i = 1 to N of (Xi - M)^2

If you subtract (2) from (1) you get the error introduced by using the sample average instead of the true mean:

(3) Sum from i = 1 to N of (Xi - A)^2 - (Xi - M)^2
= Sum from i = 1 to N of -2*A*Xi + A^2 + 2*M*Xi - M^2
= Sum from i = 1 to N of 2*Xi*(M - A) + A^2 - M^2

But the sum from i = 1 to N of Xi is N*A and all the other terms are constants so (3) is equal to

(4) 2*N*A*(M - A) + N*(A^2 - M^2)
=N*(M - A)*(2*A - A - M)
=-N*(M - A)^2

Note that this is always negative or zero, so the sum of the squared deviations from the sample average is always less than or equal to the sum of the squared deviations from the true mean (and it's equal only if the sample average happens to be exactly equal to the true mean). Looking at things another way, the sample average is the value that minimizes the sum of the squared deviations, if the true mean is any different, you will underestimate the squared deviations.

Since the expected value of N*(M - A)^2 is exactly equal to the true variance we have:

E(Sum of squared deviations from sample average) + Variance = N*Variance

so:

E(Sum of squared deviations from sample average) = (N - 1)*Variance

so the expected value of the sum of the squared deviations from the sample average divided by N - 1 is the variance.
Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -4. The time now is 08:28 PM.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.