Standard Deviation Question, How to do it? [Archive]

View Full Version : Standard Deviation Question, How to do it?

VivaLaViking

08-26-2005, 02:06 PM

In tournamentsd because the number of entrants vary I use the top percent of the finish. Using only 4 results, 4%, 15% 33% and 47% how would I calculate the Standard Deviation and the variance?

Tom1975

08-26-2005, 02:34 PM

wikipedia has a good article on it: linky (http://en.wikipedia.org/wiki/Standard_deviation) Excel also has functions for these (stdev and var). For your numbers I got a standard deviation of 16.49 and variance of 272.18

You do realize that a sample size of 4 is way too small for these numbers to have any meaning whatsoever.

AaronBrown

08-26-2005, 02:39 PM

Let me see if I understand. You entered four tournaments. In the first one 4% of the entrants were still alive when you went broke, 15% in the second, 33% in the third and 47% in the fourth (quit while you're still above median). Is this correct?

Then you want to compute standard deviation. I assume you want to predict a confidence estimate for future finishes. You have to make some assumptions to do this. Not just the usual ones about all finishes being independent and from the same distribution, but you need to guess the shape of the distribution of finishes. Normal is not appropriate, because you can't do better than 0 or worse than 1.

A power distribution is a common model, and easy to use. You assume that the probability of fraction X or less of the players finishing ahead of you is X^a for some a. Using your data suggests a = 0.2894. This imples a 67% chance of finishing in the top quarter, 82% chance of finishing in the top half and 92% chance of finishing in the top three-quarters. Of course given only four tournaments and the arbitrary model, I wouldn't put a lot of weight on these statistics.

VivaLaViking

08-26-2005, 03:40 PM

Exactly, you read my mind. (must not have a good Poker face lol) I do realize the samle size is too small to have significance.

[ QUOTE ]

A power distribution is a common model, and easy to use. You assume that the probability of fraction X or less of the players finishing ahead of you is X^a for some a. Using your data suggests a = 0.2894. This imples a 67% chance of finishing in the top quarter, 82% chance of finishing in the top half and 92% chance of finishing in the top three-quarters. Of course given only four tournaments and the arbitrary model, I wouldn't put a lot of weight on these statistics.

[/ QUOTE ]

Using the four finishes, 4, 15, 33 and 47 percent, how did you get those numbers? What is x and a?

VivaLaViking

08-27-2005, 10:37 AM

My question is first how is the variable, a, determined.

Assumung the upper quarter solution:
.04 + .15 + .33 + .47 = .99

1
- = (.99)^a
4

Log(.99}
-------- = a
Log(.25)

This doesn't look correct so I must have not understood something or maybe I'm being "special" today.

VivaLaViking

08-27-2005, 11:41 AM

Sorry,

.25 = .99 ^ a

Log(.25)
-------- = a
Log(.99)

nmt09

08-27-2005, 03:55 PM

wow, I really start to feel dumb when I read this section of the forum.

Okay so if I wanted to add a formula to my excel spreadsheet which calculated the standard variance of my earnings - [my swings] how exactly would I do it???

Is there an easy way?

AaronBrown

08-27-2005, 11:50 PM

Okay, we have to do a little math, and 2+2 doesn't have a LaTex editor to show equations. It's going to be a bit messy.

The idea is that when you enter a tournament you can model your finish with a power distribution. There's no reason for that, it's just handy mathematically and not unreasonable. The chance of finishing with X fraction of the entrants or less ahead of you is X^a. If a = 1, this is just the uniform distribution. The chance that you're in the top 25% is 0.25^1 = 0.25. The chance that you're in the top 50% is 0.5^1 = 0.5.

If a > 1, you're worse than average. If a < 1, you're better than average. The closer a is to zero, the better you are. If a = 0.5, then the chance of being in the top 10% is 0.1^0.5 = 0.32, so you accomplish that in one third of your tournaments. The chance of being in the top half is 0.5^0.5 = 0.71.

One way to estimate a from observed data is to take the average of the natural logarithms of the finishes (-1.7449) and solve:

ln(a)/(1-a) = -1.7449

You can use the Excel "Goal Seek" function to do this, or do an iterative solution. There's no closed form solution for a.

Once you get a = 0.2894, you can plug it back in to find out your chance of finishing in the top 10% (.1^0.2894 = 0.51). This may seem high given that you only did it one out of four times, but you get a lot of credit for never being below average.

With more data, you could make sure the power distribution assumption seems right, and you could get a more reliable estimate of a.

AaronBrown

08-27-2005, 11:52 PM

Standard variance doesn't sound right. There's standard deviation, which is the square root of variance. I'm not trying to pick on your terminology or make you feel stupid, I just want to make sure I know what you want. If you don't know the technical term, just tell us what you want to do with the number.

The other question is what data you have. Profit and loss per session? Per hand? Something else?

VivaLaViking

08-28-2005, 06:40 AM

These numbers are just data points of the percent of my finish.

Place Finished
--------------
Total Entrants

This was done in an attempt to normalize the data because the number of entrants is variable, although it's generaly about 2,000.

I simply want to gain some insight into my performance, whatever might be the best metric.

[ QUOTE ]

A power distribution is a common model, and easy to use. You assume that the probability of fraction X or less of the players finishing ahead of you is X^a for some a. Using your data suggests a = 0.2894. This imples a 67% chance of finishing in the top quarter, 82% chance of finishing in the top half and 92% chance of finishing in the top three-quarters. Of course given only four tournaments and the arbitrary model, I wouldn't put a lot of weight on these statistics.

[/ QUOTE ]

This sounds about right to me but my instant problem is I can't determine your value, a, unless I can determine x.

What I can gained from your post is for the data points:

finished at percent 4, 15, 33, 47

there exists some power distribution function, x, where the quadrant (or other segmentation) is x ^ a.

So I attempted to calculate the upper quarter.

1
- = x ^ a
4

and x is unknown to me and therefore a is unknown.

Did I miss something or is there a better suggestion?
Statistics is not my "strong suit", unfortunately it is not the type of math I have been doing.

08-28-2005, 09:54 AM

I have been trying to work out the variances of Online Hold'em inspector's best profile, and have done this spreadsheet:

http://www.cyberloonies.com/Solid%20Profit&Loss.xls

If you unhide the rows you can see the working out. You can see that on average you make 1.85 bb/hr at a typical table (tested against AI) but for 1000 hands there's quite a lot of variance - you couldn't be very confident that you would make money by this stage.

My sample size is 300. The graph looks a little bit like a normal distribution but not very much.

My question is would I get a normal distribution of data if the sample size was bigger, or any there too many random variables involved to ever get a normal distribution?

AaronBrown

08-28-2005, 10:57 AM

I'm not being clear. X^a is a function for converting finishes (X) into probabilities (X^a). Once you have a (I'll get to how you get it in a minute) the function tells you your probability of any given finish. If a = 0.5 then your probability of finishing in the top 10% (X = 0.1) is 0.1^0.5 = 0.32. Your probability of finishing in the top half (X = 0.5) is 0.5^0.5 = 0.71.

So a is a measure of your skill. a = 1 means your finish in the tournament is random, you're as likely to come in first as last. a > 1 means you have less than an average chance of winning. a = 2, for example, means your chance of finishing in the top half (X = 0.5) is 0.5^2 = 0.25. So only one time in four will you even be in the top half. a < 1 means you're more likely than average to finish in the top half.

To estimate a, I asked the question, what a would make your actual finishes (0.04, 0.15, 0.33 and 0.47) the most likely? It takes a little math, but the answer comes down to:

ln(a)/(1-a) = [ln(0.04) + ln(0.15) + ln(0.33) + ln(0.47)]/4

This isn't a formula for a, but you can find the a that solves it using Excel or other methods. It happens to be a = 0.2857. You can check that:

ln(0.2857)/(1 - 0.2857) = [ln(0.04) + ln(0.15) + ln(0.33) + ln(0.47)]/4

So that's how you use a and how you find a.

AaronBrown

08-28-2005, 11:11 AM

I can't promise you a Normal distribution, but this looks very much like one given the sample size. The standard deviation appears to be about 3.25. The jagged peaks are not unusual, you've sliced the data pretty thin. Nothing is exactly Normal, and there are some minor deviations in your data, but nothing that would invalidate most methods based on Normal theory.

VivaLaViking

08-28-2005, 03:32 PM

Thank you for your patience. I have a simple query about the precedence of operation on the relationship you quoted. Is this the same as you cited?

ln(a) . . [ln(0.04) + ln(0.15) + ln(0.33) + ln(0.47)]
----- = ----------------------------------------
1 - a . . . . . . . . . . . . . . 4

If so, I will try to isolate the varable, a

ln(a) . . ln(0.04 * 0.15 * 0.33 * 0.47)
----- = -------------------------------
1-a . . . . . . . . . 4

I am initialy tempted to raise the L.H.S.'s and R.H.S.'s numerator to e. It's been a while (lol)