Intro Stats Class Project [Archive] - Two Plus Two Older Archives

View Full Version : Intro Stats Class Project

10-29-2005, 10:14 PM

**this is Xposted from Poker Theory....it was recommended I post it here.

Sorry if this is repeating old posts, but I did search the topic and found nothing.

I'm in an intro statistics course and we were given an assignment to build a statistical model. (ie get data, run a regression, analyse the R^2, check for multicollinearity etc....) I thought it might be fun to try to build a model on what makes a successful low limit hold'em player.

Here's what I was thinking about using for my variables:

The concensus seems to be tight-aggressive play is the best way to win long-term.

I thought to test this I would use my poker tracker data and examine all players I have data for over 500 hands (too few? - I think so, but it's hard to get more)...

Anyway, I thought I would use BB/100 as the dependent variable, and VP$IP and Agression Factor as independent variables, and see what excel came up with.

Right now I don't have enough data to make a meaningful regression (I am data-mining 1-2 as I type this).

My questions are:

1.) Is it feasible to create a statistical model like this? (ie are there too many random variables - luck etc - to create a model like this?)

2.) What would you recommend using as independent and dependent variables?

3.) What do you think the minimum amount of observed hands should be for a player to be included in the model?

Any advice you could provide would be greatly, GREATLY appreciated.

jtr

10-30-2005, 12:10 AM

Don't use Excel, use R: it's free and much more powerful.

You are on the right track with what you've suggested already. Winrate in BB/100 is an obvious dependent variable, and anything you've got data on could be thrown in as an independent variable. VPIP, PFR, and postflop aggression all sound good.

There's no right answer as to how many hands you need to have on someone before you could include them as one case in your analysis. 500 will do fine. The more hands you have on each person, the lower the error term will be in your model. Given the variance inherent in a 500-hand run, there will be a pretty hefty error term in your model, but so be it. You may still be able to draw some conclusions.

You may want to look at the possibility of a nonlinear relationship between, say, winrate and VPIP. Indeed, VPIP, PFR and aggression would probably all be sensibly looked at as second-order polynomial terms in order to capture the idea that there may be a "sweet spot" for each one.

Hope this helps.

Doc7

10-30-2005, 02:08 AM

[ QUOTE ]
Don't use Excel

[/ QUOTE ]

10-30-2005, 03:47 AM

[ QUOTE ]
Don't use Excel, use R: it's free and much more powerful.

You are on the right track with what you've suggested already. Winrate in BB/100 is an obvious dependent variable, and anything you've got data on could be thrown in as an independent variable. VPIP, PFR, and postflop aggression all sound good.

There's no right answer as to how many hands you need to have on someone before you could include them as one case in your analysis. 500 will do fine. The more hands you have on each person, the lower the error term will be in your model. Given the variance inherent in a 500-hand run, there will be a pretty hefty error term in your model, but so be it. You may still be able to draw some conclusions.

You may want to look at the possibility of a nonlinear relationship between, say, winrate and VPIP. Indeed, VPIP, PFR and aggression would probably all be sensibly looked at as second-order polynomial terms in order to capture the idea that there may be a "sweet spot" for each one.

Hope this helps.

[/ QUOTE ]

Thanks very much for the tip....I'm downloading R right now! Pretty impressive when you can google 1 letter and get what you are looking for. I'm feeling encouraged about this. Thanks!

(If anyone has any more tips re: the model they would be greatly appreciated!)

10-30-2005, 04:06 AM

For purposes of a class assignment this should be fine. I would suggest that you use the same number of hands played for all data points so that you are using the same # of observations to get your average BB/100.

The fact you believe luck is important shouldn't matter to the regression because it is random (by definition) and fits in the disturbance term of the model.

One thing I think would be interesting that I have no idea how you would get data for is the percent of the time the player bets in each position.

I don't know much about poker tracker but another interesting factor may be the percent of times a player cold calls vs reraising a bettor.

Anyway, I think the model is going to be interesting even if it is simplistic and I would be interested in the results when you are done. Good Luck.

11-03-2005, 01:59 AM

OK,

I've made some progress, but I have a confusing issue:

VP$IP has not really worked out as expected....when I run a regression on the entire population, it comes out as an irrelevant variable because there are too many people with a high VP$IP that skew the numbers with both extremely high BB/100, and extremely low BB/100.

I thought I'd get around this problem by only taking people who had a significant number of hands....but when I do this, my entire sample (except for 3 outliers) is between 10 and 20 VP$IP. So my linear regression equation gives a positive coefficient to VP$IP as those with 20 did better BB/100 than those with 10; but the reality of VPIP is that higher is not better...just in this sample it appears to be.

Any suggestions?

My other variables are: Agression Factor, Card Luck (Calculated as % of times receiving a monster starting hand), and Bad Luck (Calculated as % of times losing with a monster starting hand).

What do you think of those variables? Any other ideas? My adjusted R^2 is .62 - Think this is good enough?

As always, any advice is greatly appreciated!

(Thanks ot those who suggested using R. I gave it my best shot but the interface was beyond me!)