Testing ICM -- some questions for discussion

the shadow · #1 05-08-2005, 04:57 PM

I've been rereading some threads on testing ICM, including:

eastbay's empirical equity study,

eastbay's post in ICM: Does it match the results , and

schwza's computer programmers (x-post) .

For those who would like some background on the subject, check out Section 4.1 of my favorite threads list.

It seems to me that the simplest empirical test of a chip modeling hypothesis is whether the equity function -- the relationship between chip value (CEV) and equity ($EV) -- is linear in a heads-up freezeout. Sklansky seems to assert that the relationship is linear (TPFAP p. 151) ("[E]qual players in a symetrical situation must win exactly in proportion to the size of their stacks."). If true, the probability of hero winning equals the hero's chips as a percentage of total chips. If the hero holds 75% of the total chips, the hero's probability of winning is 75%, etc., etc. As I understand, ICM similarly assumes linearity, at least in freezeout format.

eastbay, zephyr, and others have questioned that hypothesis. JNash has proposed a S-Curve Hypothesis . He conjectures that the equity function is convex for small stacks and concave for big stacks, with the inflection point equaling average stack size. JNash speculates that the S-curve becomes more pronounced as blinds increase relative to the average stack. If true, the probability of hero winning would not necessarily equal the hero's chips as a percentage of the chips in play.

As far as I know, no one has tested whether the equity function is linear in a HU freezeout. While many other aspects of chip modeling are both interesting and await empirical analysis, such as incorporating a skill factor, this aspect strikes me as more tractable and interesting. Focusing on 2-person play would avoid many of the computational and data collection problems involved in 3- or more person play.

Even so, the result would still be interesting. As JNash suggests, under the S-curve hypothesis, it would be correct (+$EV) for the small stack to take a perfect coin flip (50/50 CEV), but incorrect (-$EV) for the big stack to accept the same coin flip. To put it differently, it could also be correct (+$EV) for the small stack to take some -CEV gambles.

So, this reasoning leads me to some questions for discussion. First, can anyone else suggest a better (i.e, simpler and still interesting) aspect of chip modeling to test?

Second, if we were to test this aspect empirically, what data should be collected? It seems to me that it would include the following fields:
Data source
Site (Party, Stars, etc.)
Buy-in ($5, $10, $20, $50, $100, etc.)
Tournament no.
Chips for player A
Chips for player B
Winner (1 for A, 0 for B, or some similar convention)
Are there any other fields that should be collected?

Third, what should be the size of the data set? Should n=500, 500,000, 5 million, etc.?

Fourth, related to the second question, what tests should be run to test the data for linearity or non-linearity? I recall several such tests from my econ days, but recall that they dealt with time series. I am not sure of their application to this problem.

Finally, aside from collecting data, are there other ways of testing the hypothesis of a linear equity function in a HU freezeout?

I have some thoughts of my own, but thought it might be best to pose some questions for discussion. I welcome any suggestions and comments.

The Shadow

P.S. If you find it hard to visualize JNash's S-curve hypothesis, take a look at M.C. Escher's print Concave and Convex. A simple S-curve suddenly seem a lot easier. [img]/images/graemlins/smile.gif[/img]

EDIT: Links edited out to fix legibility problem.

shejk · #2 05-08-2005, 05:02 PM

Please edit for readability.

the shadow · #3 05-08-2005, 05:03 PM

It was fine when I previewed it. I'm editing it now.

Blarg · #4 05-08-2005, 05:13 PM

OMG fix it. Not that I would probably understand it anyway.

eastbay · #5 05-08-2005, 05:26 PM

Shadow,

I think it's a good place to start. The difficulty I possibly see is that any deviations from linearity may be small, and since what we'll be looking for is small deltas, it will again be hard to find them (with any confidence) without huge sample sizes.

BUT, I think it's worth trying anyway, and if you do find the S-curve phenomenon, fame and riches await you. Well, not really, but it would be a very cool result nonetheless.

Before we get into questions of sufficient sample sizes, etc, do you have something in mind for how to do the data mining?

eastbay

the shadow · #6 05-08-2005, 05:47 PM

I have a few ideas.

First, the best place to start is with data from actual tournaments. The best place to get that data is from online sites. If the data to be collected were clearly defined, one might be willing to cooperate. Poker Room, for example, has posted the limit EV data that you and I have debated before. Since the data that would be collected would not reveal any site-specific or player-specific information, one or more sites might just be willing to help out.

Second, even if no online site were willing to assist, it might be possible to collect the data from a group of players, especially if the data set required for any meaningful analysis would not be insurmountable.

Third, if online sites were not cooperating and sufficient data were not available from players, I would give some thought to taking data from bot v. bot simulations. There are a number of increasingly sophisticated poker bots available on the market. While I would have concerns about the validity of applying bot data to human play, it might be illuminating in this context. For example, if the bot data were consistent with a null hypothesis of a linear equity function, my faith in that assumption would be different than if the data suggested a non-linear relationship.

Fourth, aside from collecting tourney or bot data, there are other ways to attack the problem. For example, online electronic markets have been used to predict presidential elections and economic indicators. One way of approaching the equity function would be to ask, how much money would a third-party be willing to pay for a stack size of x in a HU freezeout? There are ways, such as online markets, to determine that answer without having to collect and analyze a gazillion data points.

The Shadow (who appreciates your patience while I fixed the links)

eastbay · #7 05-08-2005, 06:03 PM

If you want an "empirical" study then, right, you have to collect the data, and online games is the only viable option. I am doubtful that you could get cooperation from the sites.

I think the best way to do it is with an observed game data mining program. This is a significant undertaking, but has obvious multiple utilities which may justify the "cost" of developing such a capability.

Collecting data from individual players has clear sample bias problems if you are looking for effects which occur in the mean of the pool of all players.

An alternative is with a game model and simulation. I have done this with some quite simple "push/fold" strategies. The results are remarkably linear, but the strategies may not include "real world" elements that generate the deviations from linearity. I have some pretty well-defined ideas about how to improve such simulations to search for the S-curve phenomenon. But any such study is not really "empirical" and there will always be doubts about the sufficiency of the strategy algorithms to capture all of the things that real players do which might generate the deviations from linearity.

Just a quick brain dump.

eastbay

the shadow · #8 05-08-2005, 06:10 PM

Thx for your thoughts.

The data mining idea sounds promising, but would be beyond my skills.

As a lawyer, I'm continually surprised by how much information you can get by simply asking. Assume that an online site were willing to assist. What and how much information would you be looking for?

Nottom · #9 05-08-2005, 08:30 PM

I actually think the bot vs. bot idea is a very good one, since having the same bot play each other would completely remove any bias in favor of the better player. This would give an excellent control set for any chip ratio you might desire.

I'm not sure how fast a computer could run a bot v bot heads up match, but in theory you could generate a substantial amount of data relatively fast with this method.

gumpzilla · #10 05-08-2005, 08:42 PM

[ QUOTE ]
As a lawyer, I'm continually surprised by how much information you can get by simply asking. Assume that an online site were willing to assist. What and how much information would you be looking for?

[/ QUOTE ]

The size of the blinds at the time the match gets to be HU seems like it would be a relevant factor.

One issue that interests me about the data collection is how many data points one should take from each tournament. Let's say I play 20 hands heads up against my opponent; should we use this as 20 data points? My gut feeling is no, we shouldn't, because any deviations from average behavior become magnified.

As an oversimplified, unrealistic example of what I'm talking about, say I get heads up with blinds of 100, my stack being 2000, villain's stack being 8000. Now let's say he's ridiculously tight, so tight, in fact, that he'll fold anything but aces. Now I should win this just about all of the time, and so I'm going to skew the data horrendously if you take multiple data points from this tournament that show my small stack overcoming a major disadvantage and winning.