I have an idea for how we can quantify the EV-value of optimally considering blocker effects at certain nodes. I'm not sure of this sort of analysis has been done before, but I would be very curious to see the results if someone can carry out the details.
Consider an IP river spot where GTO dictates check or jam. The GTO strategy will be optimal in its blocker considerations when choosing the bluffing portion of its jamming range. A solver can give you the EV of this spot, which I'll call X.
My idea is to construct a strategy which matches the high level parameters of the GTO strategy, but which is blocker-oblivious. My idea for constructing this strategy programmatically is as follows. First, identify all N hands whose jamming-EV against a GTO opponent is negative when called - these we will identify as bluffing candidates. GTO allocates a certain amount of jamming probability mass to these hands - zero to some, and nonzero to others. Next, we will redistribute that probability mass distribution to be uniform across the entire set of N bluffing candidates. And that is our blocker-oblivious strategy. Note that this strategy matches the high level parameters of GTO: it jams with the same frequency, and it has the same value:bluff ratio. But by jamming with equal probability with all its bluffing candidates, the strategy is blocker-oblivious.
We can analyze the EV of this strategy in two ways. One is to fix the OOP calling range to be GTO. The other is to set the OOP calling range to be maximally exploitative of our constructed strategy. I'll call these EV values Y and Z, respectively. Both make some sense to use as a comparison. I'm not familiar enough with solver software to know the feasibility of either computation.
The differences, (X-Y), and (X-Z), can be interpreted as the EV gained by incorporating blockers, against a stationary GTO opponent and against an exploitative opponent, respectively.
Extending this approach beyond IP river check/jam scenarios gets hairy.
The practical value of this sort of study would be understanding how much of the value of a GTO strategy comes from optimal choices of action frequencies and value:bluff ratios, and how much comes from allocating hands to actions, using blocker considerations.
June 7, 2020 | 3:10 p.m.
Thanks again for your thoughtful reply, and for engaging with me on this thread.
I gave some deeper thought about what is really nagging me, which I was having trouble articulating. I would like to give it another whirl, if you are willing to entertain my thoughts.
Consider constructing a river shove range in a particular spot. Your construction can mostly be summarized by just 2 parameters: what percentage of your range you bet (P), and within that, your value/bluff ratio (R). GTO dictates a certain optimal choice of these parameters, P' and R'. When we ask "how far am I from GTO?", or "how exploitable am I?", I believe these 2 parameters tell most of the story. An opponent looking to exploit you mainly really cares about these 2 parameters of your strategy. Executing a good GTO approximation effectively demands that you estimate P' and R', and then determine an action with your current holding that will yield this given P' or R' overall. This computation of P' and R' can be done explicitly (by devising a methodology to estimate P' and R' from high level principles), or implicitly (by devising a methodology to estimate each of 169 action probabilities, from which P' and R' will be emergent values).
To me, it seems like in this video, you teach something like this implicit approach. For example, in one spot, you show 169 cell values, and suggest flipping a 50/50 coin in one region and flipping a 20/80 coin in another region, seemingly based off visual examination. This implicit approach feels problematic to me because even if you are reasonably close to GTO within each individual cell, you could be seriously off on your overall P and R.
To illustrate, let's say that at a particular river jam spot, you are supposed to have 75% value and 25% bluffs, while betting with 20% of your range. This means that you are betting the top 15% of your overall range (15%/20% = 75%), while betting 5%/85% ~= 6% of the rest. GTO might achieve this 6% by distributing it across the bluffing subset of the 169 cells in some manner. If I try to implement that distribution via crude per-cell heuristics, I could easily see myself missing the mark to wind up with something more like 12% instead of 6%. This is especially so in a spot like choosing a bluffing range on a river with 4 or 5 spades, where there aren't many good reasons to choose one hand vs another as a bluff candidate. If I miss the mark with 12% instead of 6%, I have now deviated from the optimal R'=75% to 15/(15 + .85*12) ~= 59.5%. This will be exploitable. Is it significantly exploitable? I'm not sure - you probably have a better sense here. Point is, by trying to implement GTO by approximating individual cells, without recognizing that the utility of those cell choices lies in the overall P and R, I have missed the forest for the trees.
Even if I somehow manage to perfectly memorize the 169 values for a given node, if my actions on earlier streets were imbalanced, I could wind up seriously off on P and R. Hence my previous athlete mechanics analogy.
June 6, 2020 | 4:43 p.m.
I believe play was 7-handed. Deep-stacked Villain raised to 3.5BB from HJ, and CO called with 80BB stack. Hero called in the BB with a 381BB stack, for an 11 BB pot.
Flop came Td7h4c. HJ cbet for 5.5BB, CO called, and Hero check-raised to 23BB. HJ called and CO folded, leading to a 62.5BB pot.
Turn came Kd. Hero bet pot and got called, leading to a 187.5BB pot, and effective stacks behind of 292BB, for an SPR of 1.56.
River came the Jc.
I can say my hand, but I'm hoping to deeply understand how to play this this river spot with my entire range, so I'll leave that out for now.
So here are some questions I am pondering. What is the % breakdown of Hero's range? I.e., what % is AQ, what % is Q9, what % is 98, what % is a set, and what % is worse than set? What are the corresponding numbers for Villain's range? What are viable methodologies/tools one can use to estimate this? If we input those ranges into a solver, what does the solver output for Hero's bet-call, bet-fold, check-fold, check-call, and check-raise ranges?
I'm hoping that the low SPR on the river along with the relatively constrained hand ranges given the board and actions will allow for a rigorous analysis here.
June 3, 2020 | 3:47 a.m.
Ben Sulsky Thanks for this well written reply.
I think one way to really rigorously model my concern would be to do a special type of node lock. Perhaps when GTO bets river, it has a flush with probability q and a non-flush with probability (1-q). But in my imperfect attempt to approximate GTO, I instead have a flush with probability p and a non-flush with probability (1-p), for some p != q. If we are able to perform a "partial node lock" that locks me into this particular choice of p (either over all river states/bet-sizes, or for particular sets of states/sizes), then we could look at the resultant EV-loss, f(p).
We know that f(q) = 0. What about f(q/2), or f((1+q)/2)? These represent overbluffing and underbluffing, respectively.
I would not be surprised if for some of these really obscure extremes of GTO strategy, that a small deviation from q could result in a large EV-loss. We could term this the "sensitivity" of a node. If, hypothetically, the river 2x+ pot sized shove node has a sensitivity of 3BB/100 at a +/- deviation of 5% from q, while just using the simple 100% cbet strategy has a ~2BB/100 EV-loss...then is it really worth it to deviate from the simpler 100% cbet strategy? That is my concern.
In some spots in your video, you suggest approximations that imply that the sensitivity of those spots is low. I'm sure your intuitions are honed well enough that you are probably right. But it's difficult for a learning player to estimate GTO sensitivity in various spots. Without a good methodology to estimate sensitivity, it is difficult to know where to focus efforts. Perhaps the next generation of GTO tools could make this clearer - they could tell you optimal bet/check frequencies, and also display to you a graph with bet/check frequency on the x-axis and EV-loss on the y-axis (aggregated over hand buckets rather than as 169 separate cells).
June 1, 2020 | 9:42 p.m.
Ben Sulsky Sometimes, athletes will rebuild their mechanics: a basketball free throw, or a golf swing, etc. Such rebuilds have multiple interacting components - knees, back, arms, etc. Typically, the athlete had locally-optimized their results subject to certain constraints of their previously suboptimal form. The rebuild path takes the athlete into a region of subpar performance, since it's hard to fix all components at once, and fixing just one without addressing the others moves them away from their fine-tuned local optimum. What's worse than the hopefully-temporary subpar performance, though, is that he can no longer rely on the feedback-loop that took him to his current form; he knows just that he is acting counter to his intuitions/habits, and that his performance is predictably worsening due to the incompleteness of the rebuild. Because of the difficulties and risks involved, some athletes choose to never rebuild their mechanics - their current suboptimal form is good enough.
When trying to move our strategy from A to B, we can miss the mark and end up at C. We know that B is optimal, but if C is worse than A, and if it's generally hard to diagnose the differences between B and C that primarily lead to their EV difference, then this is an interesting analogy to ponder.
May 30, 2020 | 7:55 p.m.
What was eye-opening to me is to see (1) how small the EV-loss for IP is by going with a 100% cbet strategy, and (2) how much simpler the game tree is in the 100% cbet strategy.
I wonder how large the EV-loss will typically be if IP partially implements the GTO strategy, in some plausible manner. For example, perhaps IP memorizes and follows the flop strategy perfectly, but then follows that up with a badly balanced turn/river strategy. This could happen if, say, I watch this video and then decide I want to try to start implementing the components of the strategy piece-by-piece over time, so as not to be overwhelmed by trying to change everything at once.