In the wake of what is likely the greatest comeback in poker match-play history, spectators have begun to call into question the likelihood of foul play in the Galfond vs VeniVidi challenge -- perhaps as the only way to make sense of what appears to be a miracle. As a tribute to all of the fascinating ways that randomness can shape our thoughts and behavior, this article is intended to provide the reader with insight to help answer two major questions:
1. How unlikely was the event I just witnessed? 2. Should I still trust Phil Galfond?
The first model we will consider is one where Phil’s win rate is constant over the whole match.
The table below shows the probability of different events based on various bb/100 estimates of Phil’s win rate:
Here we have our first important discovery: the joint event (Phil loses 45 buyins and comes back) remains of a similar order of magnitude regardless of Phil's win rate. This is happening because as Phil's bb/100 gets lower, the downswing becomes more likely but the recovery becomes less likely. The reading shows the likelihood of this match to be about 3 in 10,000, a very rare event.
Now lets look at two other possibilities: one where Phil's winrate changes halfway through the match (the mindset model), and another where the match is somehow rigged (the doomswitch model). For the mindset model we will look at the match through the lens of Phil being initially outclassed at -10bb/100 for the first half of the match, but then Phil takes some time off to prime his mind and comes back with significant edge, averaging +10 bb/100 for the second half of the match. We could play with the winrates to better represent the exact match trend, but what matters here is that the line flips about halfway and Phil comes back to win, as he did. For the doomswitch model, we will again start Phil off as outclassed at -10bb/10, then hit the doomswitch at the halfway point, giving Phil a God-like 30bb/100 edge for the second half of the match.
Whichever model you believe in, we have clearly just witnessed a very unusual event. Even the model that gives the highest probability only sees the outcome of this match around 2% of the time, compared to 0.4% for the mindset model and 0.03% for the constant win rate models. Again, we can change the before-and-after win rates of the mindset model to more accurately reflect the exact trend of the match, but it won't change the output much. The point here is to introduce mindset as a real factor when swings are this big, and to show that the mindset model results in a higher event probability than a constant win rate model.
At this point it would be tempting to draw an erroneous conclusion that since the doomswitch model predicts a 5x larger event chance than the mindset model, it is 5x more likely that there is cheating going on than just simple mindset tilt. This conclusion misses the point, which is that we have an extreme example of selection bias here. Remember, the only reason this model is being built or this article is being written is because this match is highly unusual! This is also related to the mistake many of us make when checking the probability of our downswings during a downswing. Selection bias is a huge factor. It's also why all of the mainstream poker variance calculators are unreliable tools when trying to answer these types of questions.
So is there any way to come to a reasonable conclusion, given that we are thinking about the likelihood of the match outcome after the fact?
We can try to use Bayesian reasoning, but this involves having a prior probability associated with the chance of a mindset swing vs. cheating (let's leave out the constant win rate claim for simplicity). If we set the prior at 1% cheating, then after applying the Bayesian reasoning:
P(Model M | Data) proportional to P(Data | M) * P(M)*
we then get..
P(cheating) = 4.8%, P(mindset switch) = 95.2%
Using Bayesian reasoning, if we thought there was a 1% chance of cheating prior to the match, we should increase the chance that we think something shady is going on to about 5% after witnessing the outcome.
Phil's reputation leading up to this match easily paints him as one of the most trustworthy members in the industry. It's also important to consider that due to his position, he has significantly more to lose than most people, which gives him significantly less incentive to do anything dishonest. Cheating is virtually never going to be worth the financial risk of losing his empire, even if money is the only thing that blows Phil's hair back.
A fairer prior for the likelihood of Phil cheating is probably more like 1 in 10,000. Which gives us a post Bayesian reasoning P(cheating) = 0.05%. That means after witnessing the outcome of this match, your initial suspicion of Phil being capable of cheating would increase from 1 in 10,000 to around 5 in 10,000. This leaves Phil with a trust rating that is probably still higher than the average person's best friend.
Of course models can be altered to give different conclusions, but the real point here is this:
If you are reasoning after the fact, you need extremely strong evidence to change your viewpoint from "Phil is an upstanding member of the community", to "This match was rigged, and Phil was involved". The probability of Phil being capable of cheating did increase slightly due to extreme nature of the match, but not nearly enough to rationally alter your opinion of him as a person.
Hopefully this article has helped guide you to a feeling about the match that you're comfortable with. Here's to a monumental comeback and the value of a great reputation. Congrats Phil!
Thanks to "Holonomy" for making these models possible. For those interested in the details...
Each match has been modeled assuming the win in a 100 hand blocks is normally distributed with the stated mean and a fixed standard deviation of 170bb/100 (conservative). We have then run 1 million matches, filtered for matches that have the features we want and then counted how many there are to give us an estimate of the probability of that occurring. This is obviously a model, so here are a couple of the obvious critiques:
You could argue we should be simulating every hand and that we are not including a kurtosis in the distribution. Higher kurtosis would mean these results are generally more likely but we are seeking a model for the process and absolute truth is clearly impossible to find. We are often more interested in the relative values of the various probabilities. Simulating in 100 hand batches slightly decreases the chance that we see a 45 BI downswing, but again the occasional match that just touches the buyin level for a couple of hands should not make much difference to the outcome.
Note: The doomswitch model is a little bit more inaccurate as we are not conditioning that it only being turned on if Phil is down, but it still provides for a good illustration.