Modeling hand distributions from shown-down hands

jukofyork · #1 12-05-2005, 07:48 PM

I am currently thinking about this hypothetical modeling situation:

Player A
========
Preflop: Always calls preflop.
Postflop: Folds unless holds nuts.

Player B
========
Preflop: Only calls if holding top 13% of hands, else folds.
Postflop: Folds unless holds nuts.

Player C
========
Preflop: Only calls if holding at least 1 ace, else folds.
Postflop: Folds unless holds nuts.

Player A's pre-flop hand distribution upon observation over time will be easy to model from the simple fact that it calls 100% of the time [ie: can use stats to work out P(model is always call) {assuming we allready have the hypothesis that Player A always calls...}] and we never need observe the hands they showdown to know their pre-flop hand distribution.

BUT: Player B's and Player C's pre-flop hand distributions will be vastly different, but if only the % of time they call pre-flop is used to attempt to model this ditribution, then Player C's non-linear stratergy will not be observed from this variable alone.

BUT (even more importantly!): If we attempt to model Player B's or Player C's pre-flop hand distributions based on the cards they showdown (ie: WITHOUT any hypothesis about their actual pre-flop stratergy), then our distribution will be VERY wrong based on the fact that stronger initial hole cards are more likely to make it to showdown.

For example, consider Player C's stratergy:

If player C calls with 'any ace' only, then at showdown (assuming he only ever calls down if holding the nuts postflop) we will observe many more AK's than we see A7's, more A2's than we see A6s, more AKs's than AKo's, etc. Even after (infinitly) many samples, our distribution will still be bisased towards the stronger hole cards and thus incorrect.

Any Ideas on how this observered distribution could be 'transformed' to account for stronger hole cards making it to showdown more often (again assuming no hyposthesis, only obervation)?

Also, even considering if we knew exact EV of every holding, could this be used for the 'transform' somehow?

Thanks in advance - Juk [img]/images/graemlins/smile.gif[/img]

Buccaneer · #2 12-05-2005, 08:58 PM

[ QUOTE ]
........ution will be VERY wrong based on the fact that stronger initial hole cards are more likely to make it to showdown.

[/ QUOTE ]

I may be very wrong but of the 13% top hands very few are the nuts after the flop. Example flop 2, 3, 5 rainbow the nuts are trip 5's, which you would rarely play (does 55 qualify?) and even if you had AA you would be required to fold your hand against this board.

Here is something that I have been thinking about lately. Say we have another player type, lets call him:

Player D
========
Preflop: Plays any two cards, calls any raise, rr, or cap.
Postflop: Continues to play from any position all calls, raises, rr, and caps. He raises any hand with paint, and raises any two with a frequency of 50%. He also has favorite hands which are always junk (72o, 39o, etc) but he has to raise because on time he won a pot with them. He will fold unimproved hands at the river if they don't contain an A, K, Q, J, or his lucky 5. He will not fold if he is beat as long as he has outs. He never goes on tilt and says nh when someone beats him with a runner, runner. His congratulations are sincere and from the heart. He is truely happy for anyone that hits a runner runner.

Ok now lets pretend that there are 6 player D types in every game at the .50/1 level games. We can probably call them fish instead of Player D. So we have 6 fish in the game and this causes a schooling effect. They school to protect themselves and confuse the fish that want to eat them. Now when they school there is a mysterious reflex that kicks in and they drop off a few at each street so that there is always two fish (Player D) flopping in the river with one of the predator fish. What they have mysteriously done is drop out one at a time and left the strongest one of them in the pot. Science has yet to explain this primative behavior but it is effective. So effective that the predators begin to fear the little fish.

Now what we have here is a shark that plays KQ in good position and 6 little fishes that play any two. If more than one shark trys to feed the little fish raise so that he has to cold call to play. So we know the shark has a 2:3 chance to see a crap flop and the six fish have the same chance individually but as a school they have an excellent chance of having one or more of them hit the flop hard. The flop is K, Q, 4. So going into the turn we have one shark and two fish that hit the flop and 1 fish that did not. The turn is an A and shark bets out and fish call and maybe raise. A couple of fish do fold and the river comes a very safe 3. Shark bets out and fish show him his 25o which is his favorite hand. He doesn't even have to win with it, he feels good just having the opportunity to watch its magic work.

Now what I have been wondering about is this: If shark has 4 outs to improve and the little fish have just 2 each how do they seem to concentrate all those outs into 12 outs on each hand that can beat you. It is like they have an out magnifying glass and they are burning the sharks with it.

I would like to see what the numbers on that would look like after someone ran them through a computer.

gomer · #3 12-06-2005, 04:12 AM

If you you know the strategies of all the opponents, as in this case, the transformation is pretty straightforward. In fact, there would be a (immensely complex) closed solution based on the probability of a given hand being the nuts at each stage. Why you would WANT a closed solution to a ridiculously arbitrary question is beyond me. Especially since you've already SPECIFIED player C's strategy.

poincaraux · #4 12-06-2005, 12:27 PM

jukofyork:

Do you have a large hand-history database of your own hands? I only have ~8k right now. If you have a more reasonable size, why not try this out on your own results? If I get some time over the weekend, I'll try this with my puny database.

jukofyork · #5 12-10-2005, 04:00 PM

I think maybee here my fault... I only posted about certain abstract situations in relation to a bigger idea that I am stuck with [img]/images/graemlins/smile.gif[/img]

Let me try to re-formulate my exact problem I am considering:

1. We have infinite computational resources.
2. We wish to play 'maximally' rather than 'optimally' against a set of KNOWN opponent(s) where for each: we have previously observed playing and have collected an INFINTE amount of data on each opponent, where each opponent has played against an INFINITE number of opponents, with an infinite range of other playing styles.

The question (in relation to all forms of poker [with some degree of hidden information]) I ask then is:

In any given STATE where YOU have a decision to make, then assuming infinite computational resources (both space and time), can we theoretically compute via Minimax algorithm the EXACT EV of call or raise?

Now consider these abstract versions of poker:

A: Poker played without hidden information (ie: with all hands played face up)
---------------------------------------------------------------------------------------------------------------
This abstact game can be solved (with infinite computational resources), even for N-players using Minimax algorithm and their are no ditributions of possible player holding to consider (eg: No harder than backgammon, just using chance nodes, but still need opponent modeling, etc).
NOTE: The question here is: If each of our opponents also thinks beyond the 1st level and they themselves use the models of others players when making their decisions (2nd level), and then they also consider models of other players (3rd level)... then will their actually be an Nth level where the tree stops growing (is the tree finite?) - I think for poker yes, assuming their are actually a finite number of actions a player can make in a hand (the 4 bet capp), because this will limit the amount of possible Nth-level thinking making the tree finite.

B: Poker played where every time you fold you are forced to show your hand:
---------------------------------------------------------------------------------------------------------------
I think that theoretically this abstract game can also be solved, as in actual fact it is just a more complex version of Game A. Even though we do not know the exact hand our opponent hold, we should at any point in the game be able to say the exact probability distribution of his holdings. This now means we have an extra layer of chance nodes each time an opponent takes an action, with each chance node corresponding to the probability of this player having a certain hand. Again thinking about the point made about Game A, I think that again because of the limit on the number of possible action, this tree is also finite and fully observable, and we can thus search and find exact EV of call and raise.

C: Real poker:
---------------------
Now if you read my original post about my three abstract games, it should make more sense why I am considering them.
The problem now is that even with inifinite data on opponents, I think it is impossible to create the whole tree, and thus cant find exact EV of call and raise, because now we cannot either be sure to have the exact hand distribution of each opponent nor can we be sure that our future predictions (via the mini-max search) will be correct due to the fact that a opponents fold descion could be non-linear (see my 3 original examples / counter-examples).

Is my reasoning flawed for Game A or Game B, is the tree really fully observable (thus exact) and finite?

"In general, determining an exact opponent model from showdown observations is impossible -- it is strictly unknowable" - Darse, sums up my problem with Game C... [img]/images/graemlins/smile.gif[/img]
Juk [img]/images/graemlins/smile.gif[/img]

jukofyork · #6 12-10-2005, 05:11 PM

Sry, I must be tired today - I can't see how to edit the post, but maybee I need to clarify this:

(Also see: http://www.poki-poker.com/forums/vie...4b1b63e011bf4)

Why I think the search tree is actaully NOT infinite:

1. Whenever any player makes a decision, they make this decision (non-deterministicly: P(f)/P(c)/P(r), etc) based soley on the current 'state' of the game.
2. If this 'state' is finite, and we are limited to the number of possible actions we can perform, the tree must also be finite.
3. The current 'state' of the game includes both the 'game state' (there are only finitly many games/states in poker - no continuous variables) and 'opponent state'.
4. The 'opponent state' is based on ALL of our past observations (inifinitely many) of how an opponent themselves acted given a certain 'state' (ie: they also make their decisions based on both 'game state' and 'opponent state').
5 'Opponent state' therefore involves a recurive "I think, you are thinking that I am thinking, you are thinking, ..." (Nth level-thinking), so it appears that we now have a point where an inifinite tree can develop, making the 'Opponent state' part of the current 'state' infinite (breaking statement 2. - see above). BUT, unless we are also playing vs opponent(s) who also have infinite computation resources (in which case our stratergies and counter stratergies would converge to 'optimal' [this is not what I am considering here...]), there has to be some point where the opponent's Nth-level thinking stops (thus making the 'Opponent state' finite, creating leaf nodes, and thus making the whole search tree finite).
* In other words, even though our opponents make descisions based on Nth-level thinking (making our state-space unimaginatively large), they still make their descions based on a finite 'state' (assuming opponents cannot think to an infinite level) (see 2.).

I know this all hard to think about conceptually, but even considering inifinte resources and data, a 'maximal' counter-stragergy is not obviously possible (even for fully observable games such as chess, this same conceptual problems could arise if Nth level opponent modeling were applied to try to create a 'maximal' counter-stragergy [in the long history of chess computing; research has tended to only consider (approximate) 'optimal' stratergies (vs current chess knowledge), rather than 'maximal' counter-stragergies based on opponent flaws...]).

Juk [img]/images/graemlins/smile.gif[/img]

dfan · #7 12-12-2005, 12:44 PM

Returning to your original question -

For NL at least, if you had an infinitely large database of hands on a player who played the same preflop hand range throughout, you could determine this hand range precisely and simply.

Say, for example, you wanted to know what percentage of the time a player plays A9 UTG. Simply look at all hands where this player was UTG, one of the blinds was forced all-in due to low chips, there is no preflop raise and the flop was AA9. Every time the player has A9 in this situation he will show it down (assuming he doesn't fold flopped boats). If the player plays A9 100% of the time then on 1.2% of such instances (blind all-in, no raise, flop AA9) the player will show down his AAA99 boat. If he plays A9 UTG 1/2 the time, he will show down AAA99 0.6% of the time, and so on.

Once you determine the percentage of time he enters utg with A9, you could also figure out how often he folds it to a raise preflop. Again look at AA9 flops after a preflop raise. If he shows down AAA99 1/2 as often as when there was no preflop raise then obviously he folds 1/2 the time to this raise.

You could do this for the rest of his hand range.

In limit, you could use a similar approach of looking at flops that would give him a boat. The problem is that often everyone will fold after he bets so no showdown. So I'm not sure how you could get the same deterministic backtracking to work. There is probably a clever way to do so though, I just can't think of it off the top of my head.

Wait I just thought of such method. You can compute how often he will be dealt A9 UTG and another player will be dealt AA. I think we can assume that whenever this occurs and the flop is AA9 it will always get shown down. So just look at that situation and see how often the player shows down his bad beat 2nd best boat.

roueful · #8 12-13-2005, 03:23 AM

Another way of doing it is to simply look at the types of flops (and turns) the players are calling with. If they only call with the nuts, it's going to be a pretty rare occurence, and they'll usually make it to showdown. Assuming you can deduce this part of their strategy (only playing the nuts), you can narrow down what they were holding even when they don't make it to showdown.

For example, player C playing A3o would only make it to showdown with quads or a board of AA332. But they'll call several types of flops. Simply checking for what they called with on the flop, and noticing when it become non-nut and they folded, you could pretty much deduce all their starting hands from when they called.

Another way of doing it would be similar to the approach above, looking at all the one card flops 222-AAA and checking the kicker.

This is of course assuming the opponent is reacting in a predictible way, as summarized in your descriptions.

Your main question, though, I think, is if you can extrapolate an opponent's entire preflop/postflop strategy based only on the cards they show down. And knowing that they only call with nut hands, if you can somehow use EV weighting to check your assumptions about what they're folding.

It seems like the answer would still have to be in hands shown down, though. It won't actually be directly correllated with EV, since nut hands are different than profitable hands (eg suited hands without an ace).

For example, take a player who only play pocket pairs. They're going to make the nuts (set) on several flops, but have to fold to any boards which are paired, have a possible straight or flush. Additionally, the only way 22-99 can't possibly have the nuts on the river without making quads. 22-55 can't even call the flop without quads. So you're going to see TT shown down slightly more often than those hands, on the rare 2367T board, and JJ and above gradually more often. Based on those showdowns, the easiest thing to do is just check how often the player made quads, note that it's evenly distributed through pairs (you'd have to omit or account for hands where the board double paired), and make the projection that it's representative of their hand range.

jukofyork · #9 12-15-2005, 08:42 PM

[ QUOTE ]
Say, for example, you wanted to know what percentage of the time a player plays A9 UTG. Simply look at all hands where this player was UTG, one of the blinds was forced all-in due to low chips, there is no preflop raise and the flop was AA9. Every time the player has A9 in this situation he will show it down (assuming he doesn't fold flopped boats). If the player plays A9 100% of the time then on 1.2% of such instances (blind all-in, no raise, flop AA9) the player will show down his AAA99 boat. If he plays A9 UTG 1/2 the time, he will show down AAA99 0.6% of the time, and so on.

Once you determine the percentage of time he enters utg with A9, you could also figure out how often he folds it to a raise preflop. Again look at AA9 flops after a preflop raise. If he shows down AAA99 1/2 as often as when there was no preflop raise then obviously he folds 1/2 the time to this raise.

[/ QUOTE ]

Yes, I think this idea is the 'transform' I was looking for in my original post (ty!) and I think in practice, some kind of abstract version of this idea could be helpful (for NL maybee [in practice, with finite data] a model will converge quicker?).

[ QUOTE ]
The problem is that often everyone will fold after he bets so no showdown.

[/ QUOTE ]

Every time I think about this, it seems to come back to this question (again I think more in terms of limit, and trying to ignore stack size as a variable...).

[ QUOTE ]
Your main question, though, I think, is if you can extrapolate an opponent's entire preflop/postflop strategy based only on the cards they show down. And knowing that they only call with nut hands, if you can somehow use EV weighting to check your assumptions about what they're folding.

[/ QUOTE ]

Yes, this is my goal (to start with thinking in terms of inifinte resources, but ultimately wondering if this can be abstracted to some real level), but not just in the context of nut-hands (there were just very extreme examples to show the non-linearity of the problem for differnt opponent stratergys showing similar stats on observation).

Sry, I forget to post here too (this discusion going in two threads, sry!), but i posted this in the poki-forums (see here):

[ QUOTE ]
Player Z
======
Pre-flop: Only calls with AA, KK, 22 and 77.
Post-flop: Always calls down to showdown with AA or KK, always folds 22 on the flop and then folds 77 on the turn.
(Assumes that the flop and turn are always bet)

Our prediction about this player BEFORE we see him take an action:

P(call)=4/220 {this from memory, so correct me if 220 a bit out!}
P(fold)=216/220

Our pre-flop distribution for this player AFTER we see him call:

P(AA)=1/4
P(KK)=1/4
P(<unknown&gt

=1/2

On the flop we now predict, BEFORE we see him take an action:
P(call)=P(AA)+P(KK)+P(<unknown&gt

=1/4+1/4+1/4=3/4
P(fold)=1-P(call)

Our flop distribution for this player AFTER we see him call:

P(AA)=1/3
P(KK)=1/3
P(<unknown&gt

=1/3

On the turn we now predict, BEFORE we see him take an action:
P(call)=P(AA)+P(KK)+P(<unknown&gt

=1/3+1/3+0=2/3
P(fold)=1-P(call)

Our turn distribution for this player AFTER we see him call:

P(AA)=1/2
P(KK)=1/2

NOTE: Addition of P(raise) doesn't really complicate the calculations, but makes a simple example harder to understand.

NOTE: <unknown> is a hand (hole cards) we have never seen goto showdown. We see player call/raise with this 'unknown' hand, only to later see them fold it before showdown (ie: This is not really a hand, but a range of 'unknown' hands that they could have, but will fold later [most importantly = THE ACTUALY HAND RANGE DOES NOT SEEM TO MATTER...]). - Perhaps a better name is <never-known> or <never-seen>.

We now have a seperate hand-independant prediction model for use in search capable of predicting P(f)/P(c)/P(r) based purely on all the available 'state' information (appart from the actual hole cards a player holds...), along with an extra <unknown> hand classification added to the distribution.

Thinking back to all of the hypothetical players (A,B,C & D), this idea now seems to work well at countering their stratergys (even the non-linear) in a game where ceratin folded hands we will never see (ie: Game C - Real Poker, see previous post(s)).
Is this maybee the best that can be done (this does not require a 'transformation' and it ignores the "intermediate distributions" to some extent [see terence's reply] - at least for the <unknown> hands)?
Is their anything fundamentally wrong with this idea in the context of my problem outlined in the previous posts?

Juk [img]/images/graemlins/smile.gif[/img]

[/ QUOTE ]

But after consideration, I THINK that this idea is now almost equivalent to a "best-response strategy --terence" (again see the poki-thread):

[ QUOTE ]
I hardly read any of these last three posts (just sped through them partially absorbing things) since they are so long and I unfortunately don't have any spare time these days. Anyway, I think in the last post you gave an example of what I was saying. Basically, the whole idea is fundamental idea that Vexbot uses for its modelliing (if I understood any of what you were saying - but again I didn't get much of a chance to read it). You can read about it in Aaron's thesis where he talks about Miximax, or in the UofA paper about search in poker which I can't remember the title of. Unfortunately, those were written with a focus on the search part and not the modelling part (at least, not the modelling part that you are interested in). My thesis will have a bit more on the modelling part, but I am finishing it up so there is nothing to read yet.

Hope any of what I said helps,
Terence.

[/ QUOTE ]

Why I think this now, is bc my 'Player Z' calculations actually ignored the fact that sometimes (in general/reality) a player would play say AA or KK, but fold it later in the hand [sometimes!] (so now the <unknown> class contains some instances of AA or KK, and the classed I called AA and KK actually hold P(AA at showdown) or P(KK at showdown) - so i think it come back to purely a "best-response strategy" (which does seem to model enough to counter all the abstract modeels A,B,C & Z without the non-linear descisions causing any problems...).

I guess this gona confuse me for some time to come, lol. But eventually it would be nice to come to some conclusions about a 'maximal' counter-stratergy. In chess it is possible, if tried, to attempt to extrapolate (ingnoreing potential future botlenecks) the point in the future where computers will have advanced enough to actaully know the full search tree (at this point chess will be 'solved' and enter 'the hall of fame', leaving only 'maximal' stratergy vs imperfect humans, open for research). The same idea must also apply to poker to some extent (see UofO research on HU optimnal players), but of more interest to me is can any of these ideas be abstracted enough to be of any help with current technology and finite data... [img]/images/graemlins/smile.gif[/img]

Juk [img]/images/graemlins/smile.gif[/img]

AaronBrown · #10 12-16-2005, 08:12 AM

As a purely theoretical matter, even if a player's strategy is constant and nonrandom, you could not deduce it even from an infinite set of observations. You obviously could deduce the situations in which the player goes to showdown, with a large enough sample you would observe all of them. This would allow you to make some inferences about folded hands, for example if a player ever shows A9, then you know she doesn't always fold A9 preflop.

But suppose a player always bets 72o preflop and folds it postflop, and always folds 73o preflop. You would have no way of distinguishing this from a player who does exactly the reverse. Neither hand will ever be revealed, and since they are exactly as common, there's no way to tell the difference based on frequency.

Therefore, unless a player shows down every hand, you will have to make some strategy inferences from reasonability assumptions, even with an infinite sample.