I Exciting to sound off about fuel crisis? The goal of these first sessions, other than playing solid poker, should be to familiarize one's self with the nuances of playing online. Epitaph beginning Epitaph for a bull? April 12, at 9: Fortunately this extra work is all quite deterministic and Alan is still following a strict set of rules with no room for judgement to creep in. This page was last edited on 19 July , at
Online Poker Strategy & Theory
If you expect your opponent to call your 3-bet with a wide range of marginal hands such as , A-T, and K-9, you should 3-bet with what is referred to as a linear range. You will find this strategy works best against weak players who are not capable of folding to a 3-bet before the flop once they have any amount of money invested. Going to the flop against players who are frequently dominated will work amazingly well for you.
If you expect your opponent to either fold or 4-bet rereraise when you 3-bet, you should 3-bet with what is referred to as a polarized range. Notice that when implementing this strategy, you will be calling with your hands that flop decently well, such as A-J, K-Q, and 8s-7s.
This strategy works well because when you 3-bet with a weak hand, you rarely expect to see a flop. With hands like A-J and K-Q, you typically want to see a flop, and calling ensures that happens most of the time. While 3-betting with marginal hands is an excellent way to make you more difficult to play against, against certain opponents, you should only 3-bet with your premium hands. If you expect your opponent to only raise with a premium hand before the flop, which is a trait some small stakes players exhibit, there is no point in bluffing because your bluff is almost certain to fail.
This will usually be the case when the initial raiser is in early position or known to be overly tight. Against these players, it is important that you do not overvalue hands like T-T and A-Q because if you 3-bet and your opponent does not fold, you are often in bad shape.
Your goal with your strong hands should not be to play them in a manner that forces you to fold. Instead, call the initial raise and see what develops after the flop.
If you sit around and hope for someone to dump their chips to you, you will frequently be disappointed and end up blinding off.
Thanks for taking the time to read this blog post. If you enjoyed it, you will love my training site PokerCoaching. Check it out and sign up for your free 7-day trial. Let me know what you think! This is great content! This is going to help me a lot. I also need to do more work on when I should call in position and not three bet.
Little give us good information, he is a good coach and i learn a lot from him and his poker play. I am thrilled to know this post blog because when i play cash games and make a 3 bet vs a reg ho likes to call a lot my 3 bets now i know i should 3 bet a linear range. Thank u Mr Jonatan. For years I played the straightforward style referenced above. I could not understand why I always lost money.
I recently adopted the above strategy because of your great coaching. This is great content, I need to start employing a 3 betting strat. I am putting in work, thanks for laying some foundation Jonathan. Is the idea to establish an image early?
I use this strategy all the time. It is not to develop an image. It is simply good poker because your bets to not indicate obvious strength or weakness, leading to your opponents to make mistakes. Also, position is relevant. We discuss this a ton at PokerCoaching. Sure enough old man calls BTN, sb calls and we go off 4 ways.
Both sound pretty good. But yeah, a lot of live players fall into two camps: The other will do battle, but always wait until after the flop. And what happens when flop bricks? What started off as a fairly simple adjustment adding 3bet range with little cost can quickly turn into a costly endeavor with post flop play.
You should usually bet 54s on most flops because you will have a range advantage, and it should be some sort of draw. This is all discussed extensively at PokerCoaching. In a lot of low stakes amateur settings, likely carried over to the WSOP, you will see the following pattern: This puts a target on his back. Second, say we are on the button with part of our linear range. We should consider 3betting fairly large to clear the field and potentially win the pot now.
When we get callers, we will have both relative and absolute position with a range advantage on most flops. If we are then 4bet, we can confidently get out of the way. This move may also be valid when in the blinds, although I can see arguments for just calling to avoid bloating the pot OOP.
So, it kind of forces you to play tight because you have no implied odds against the short-stacks. How would you adjust to this environment? As for the polarized ranges, some parts of the range could also carry over to 2betting vs limps, however there are some weaker holdings that could benefit from just a call behind? I definitely do not think range construction for 3bet vs 2bet vs limp ranges are exactly the same, but we can come to similar conclusions?
The exact hands selected are not too relevant. Instead, you want to ask which hands are at the bottom of your calling range. Those hands become part of a polarized 3-betting range.
You will almost never get raised by someone who initially limped. If you do, you should usually fold as an exploitative play unless your opponents are lunatics. Sorry I phrased that wrong, not backraise, i mean if we 2bet vs a limp or limps and someone behind us 3bets…. Also, it depends on your pot odds. How can we know which one will respond?
But then, reference is made to responding to the initial raiser and based on the way they are expected to respond, applying an appropriate aggressive strategy. So is this blog describing an in position 4betting strategy then? Am I making this too complicated?!?
The big blind is the first bet. The rest of what is usually a social interaction between two people is all taken on by Donald. All that happens inside MENACE is that one at a time, either three or four times sequentially, one of its matchbox drawers is opened and a bead is randomly removed, and then either the beads are taken away, or they are put back in the boxes from where they came with either one or three additional beads of the same color, and the boxes are closed.
All the gameness of tic-tac-toe is handled by the human Donald. It is he who initiates the game by handing Alan a string of nine periods. It is he who manages the consistency of subsequent turns by annotating his hand drawn tic-tac-toe board with the moves.
It is he who decides when the game has been won, drawn, or lost, and communicates to Alan the reinforcement signal that is to be applied to the open matchboxes. It is he, Donald, who decides whether and when to initiate a new game.
That today is both the strength and weakness of modern Machine Learning. Really smart people, researchers or engineers, come up with an abstraction for the problem in the real world that they want to apply ML to. Those same smart people figure out how data should flow to and fro between the learning system and the world to which it is to be applied.
They set up machinery which times and gates that information flow from the application. And those same people set up a system which tells the learning system when to learn, to adjust the numbers inside it, in response to a reinforcement signal, or in some other forms of ML a very different, but still similarly abstracted signal—we will see that in the next chapter.
After the design work was done on MENACE, all that could change during learning as the value of the parameters, the numbers of various colored beads in various matchboxes. Those numbers impact the probability of randomly picking a bead of a particular color from a matchbox. If the number of red beads goes down and the number of amber beads goes up over time in a single matchbox, then it is more likely that Alan will pick an amber bead at random.
In this way MENACE has learned that for the particular situation on a tic-tac-toe board corresponding to that matchbox the square corresponding to the amber bead is a better square to play than the one corresponding to a red bead.
It does not learn any new structure to the problem while it learns. The structure was designed by a researcher or engineer, in this case Donald Michie. This is completely consistent with most modern Machine Learning systems. The researchers or engineers structure the system and all that can change during learning is a fixed quantity of numbers or parameters, pushing them up or down, but not changing the structure of the system at all.
In modern applications of Machine Learning there are often many millions of parameters. Sometimes they take on integer values as do the number of beads in MENACE, but more usually these days the parameters are represented as floating point numbers in computers, things that can take on values like 5. Notice how simply changing a big bunch of numbers and not changing the underlying abstraction that connected the external problem playing tic-tac-toe to a geometry-free internal representation the numbers of different colored beads in matchboxes is very different from how we have become familiar with using computers.
When we manage our mail box folders, creating special folders for particular categories e. Machine Learning, as in the case of MENACE, usually has an engineering phase were the problem is converted to a large number of parameters, and after that there is no dynamic updating of structures.
In contrast, I think all our intuitions tell us that our own learning often has our internal mental models tweak and sometimes even radically change how we are categorizing aspects of the skill or capability that we are learning. My computer simulations of MENACE soon had the numbers of beads of a particular color in particular boxes ranging from none or one up to many thousand. Sometimes there will be parameters that are between zero and one, were just a change of one ten thousandth in value will have drastic effects on the capabilities that the system is learning, while at the same time there will be parameters that are up in the millions.
There is nothing wrong with this, but it does feel a little different from our own introspections of how we might weigh things relatively in our own minds.
If we taught tic-tac-toe to an adult we would think that just a few examples would let them get the hang of the game. My simulation is still making relatively big progress after three thousand games and is often still slowly getting even a little better at four thousand games. In modern Machine Learning systems there may be tens of millions of different examples that are needed to train a particular system to get to adequate performance.
But the system does not just get exposed to each of these training examples once. Often each of those millions of examples needs to be shown to the system hundreds of thousands of times.
Just being exposed to the examples once leaves way to much bias from the most recently processed examples. Instead by having them re-exposed over and over, after the ML system has already seen all of them many times, the recentness bias gets washed away into more equal influence from all the examples.
Training examples are really important. Learning to play against just one of Player A, B, or C, always lead to very different performance levels against each of these different players with learning turned off in my computer simulation of MENACE. This too is a huge issue for modern Machine Learning systems. With millions of examples needed there is a often a scale issue of how to collect enough training data. In the last couple of years companies have sprung up which specialize in generating training data sets and can be hired for specific projects.
But getting a good data set which does not have unexpected biases in it can often be a problem. In the parlance of Machine Learning we would say that when MENACE was trained only against Player B, the optimal player, it overfit its playing style to the relatively small number of games that it saw no wins, and few losses so was not capable when playing against more diverse players.
In general, the more complex the problem for which Machine Learning is to be used, the more training data that will be needed. In general, training data sets are a big resource consideration in building a Machine Learning system to solve a problem. The particular form of learning that MENACE both first introduced and demonstrates is reinforcement learning, where the system is given feedback only once it has completed a task.
If many actions were taken in a row, as is the case with MENACE, either three of four moves of its own before it gets any feedback, then there is the issue of how far back the feedback should be used. In the original MENACE all three forms of reinforcement, for a win, a draw, or a loss, were equally applied to all the moves. Certainly it makes sense to apply the reinforcement to the last move, as it directly did lead to that win, or a loss.
In the case of a draw however, it could in some circumstances not be the best move as perhaps choosing another move would have given a direct win.
As we move backward, credit for whether earlier moves were best, worst, or indifferent is a little less certain. A natural modification would be three beads for the last move in a winning game, two beads for the next to last, and one bead for the third to last move.
Of course people have tried all these variations and under different circumstances much more complex schemes would be the best. We will discuss this more, a little later.
In modern reinforcement learning systems a big part of the design is how credit is assigned. In fact now it is often the case that the credit assignment itself is also something that is learned by a parallel learning algorithm, trying to optimize the policy based on the particulars of the environment in which the reinforcement learner finds itself.
Getting front end processing right. This simultaneously drastically cut down the number of parameters that had to be learned, let the learning system automatically transfer learning across different cases in the full world i. Up until a few years ago Machine Learning systems applied to understanding human speech usually had as their front end programs that had been written by people to determine the fundamental units of speech that were in sound being listened to.
Those fundamental units of speech are called phonemes, and they can be very different for different human languages. Different units of speech lead to different words being heard. In earlier speech understanding systems the specially built front end phoneme detector programs relied on some numerical estimators of certain frequency characteristics of the sounds and produced phoneme labels as their output that were fed into the Machine Learning system to recognize the speech.
It turned out that those detectors were limiting the performance of the speech understanding systems no matter how well they learned. Getting the front end processing right for an ML problem is a major design exercise. Getting it wrong can lead to much larger learning systems than necessary, making learning slower, perhaps impossibly slower, or it can make the learning problem impossible if it destroys vital information from the real domain.
Unfortunately, since in general it is not known whether a particular problem will be amenable to a particular Machine Learning technique, it is often hard to debug where things have gone wrong when an ML system does not perform well.
Perhaps inherently the technique being used will not be able to learn what is desired, or perhaps the front end processing is getting in the way of success. Just as MENACE knew no geometry and so tackled tic-tac-toe in a fundamentally different way than how a human would approach it, most Machine Learning systems are not very good at preserving geometry nor therefore are they good at exploiting it.
Geometry does not play a role in speech processing, but for many other sorts of tasks there is some inherent value to the geometry of the input data. The engineers or researchers building the front end processing for the system need to find a way to accommodate the poor geometric performance of the ML system being used. The issue of geometry and the limitations of representing it in a set of numeric parameters arranged in some fixed system, as was the case in MENACE, has long been recognized.
While people have attributed all sorts of motivations to the authors I think that their insights on this front, formally proved in the limited cases they consider, still ring true today.
Fixed structure stymies generalization. The fixed structures spanning thousands or millions of variable numerical parameters of most Machine Learning systems likewise stymies generalization. We will see some surprising consequences of this when we look at some of the most recent exciting results in Machine Learning in a later blog post—programs that learn to play a video game but then fail completely and revert to zero capability on exactly the same game when the colors of pixels are mapped to different colorations, or if each individual pixel is replaced by a square of four identical pixels.
Furthermore, any sort of meta-learning is usually impossible too. A child might learn a valuable meta-lesson in playing tic-tac-toe, that when you have an opportunity to win take it immediately as it might go away if the other player gets to take a turn. Machine Learning engineers and researchers must, at this point in the history of AI, form an optimized and fixed description of the problem and let ML adjust parameters. All possibility of reflective learning is removed from these very impressive learning systems.
This greatly restricts how much power of intelligence and AI system with current day Machine Learning systems can tease out of their learning exploits. Humans are generally much much smarter than this. There have been some developments in reinforcement learning since , but only in details as this section shows. Reinforcement learning is still an active field of research and application today. It is commonly used in robotics applications, and for playing games. It was part of the system that beat the world Go champion in , but we will come back to that in a little bit.
Without resorting to the mathematical formulation, today reinforcement learning is used where there are a finite number of states that the world can be in.
For each state there are a number of possible actions the different colored beads in each matchbox corresponding to the possible moves. The policy that the system currently has is the probability of each action in each state, which for MENACE corresponds to the number of beads of a particular color in a matchbox divided by the total number of beads in that same matchbox.
Reinforcement learning tries to learn a good policy. The structure of states and actions for MENACE, and indeed for reinforcement learning for many games, is a special case, in that the system can never return to a state once it has left it. That would not be the case for chess or Go where it is possible to get back to exactly the same board position that has already been seen. In some cases they are probabilities, and for a given state they must sum to exactly one.
For many large reinforcement learning problems, rather than represent the policy explicitly for each state, it is represented as a function approximated by some other sort of learning system such as a neural network, or a deep learning network. The steps in the reinforcement process are the same, but rather than changing values in a big table of states and actions, the parameters of MENACE, a learning update is given to another learning system. MENACE, and many other game playing systems, including chess and Go this time, are a special case of reinforcement learning in another way.
The learning system can see the state of the world exactly. In many robotics problems where reinforcement learning is used that is not the case. There the robot may have sensors which can not distinguish all the nuances in the world e. But in reality it could be that an early move was good, and just a dumb move at the end was bad.
The Q function that he learns is an estimate of what the ultimate reward will be by taking a particular action in a particular state. This is how they built their Alpha Go program which recently beat both the human Korean and Chinese Go champions. As a side note, when I visited DeepMind in June this year I asked how well their program would have done if on the day of the tournament the board size had been changed from 19 by 19 to 29 by I estimated that the human champions would have been able to adapt and still play well.
My DeepMind hosts laughed and said that even changing to an 18 by 18 board would have completely wiped out their program…this is rather consistent with what we have observed about MENACE. Alpha Go plays Go in a way that is very different from how humans apparatently play Go. In English, at least, ships do not swim. Ships cruise or sail, whereas fish and humans swim. However in English planes fly, as do birds. By extension people often fly when they go on vacation or on a business trip.
Birds move from one place to another by traveling through the air. These days, so too can people. But really people do not fly at all like birds fly. Birds who can fly that far non-stop and there are some certainly take a lot longer than a day to do that. If humans could fly like birds we would think nothing of chatting to a friend on the street on a sunny day, and as they walk away, flying up into a nearby tree, landing on a branch, and being completely out of the sun.
If I could fly like a bird then when on my morning run I would not have to wait for a bridge to get across the Charles River to get back home, but could choose to just fly across it at any point in its meander. We do not fly like birds. Human flying is very different in scope, in method, and in auxiliary equipment beyond our own bodies.
Arthur Samuel introduced the term Machine Learning for two sorts of things his computer program was doing as it got better and better over time at and through the experience of playing checkers. A person who got better and better over time at and through the experience of playing checkers would certainly be said to be learning to be a better player.
Thus, in his first sentence of his paper, again, does Samuel justify the term learning: What I have tried to do in this post is to show how Machine Learning works, and to provide an argument that it works in a way that feels very different to how human learning of similar tasks proceeds. Thus, taking an understanding of what it is like for a human to learn something and applying that knowledge to an AI system that is doing Machine Learning may lead to very incorrect conclusions about the capabilities of that AI system.
These are words that have so many different meanings that people can understand different things by them. Even for humans it surely refers to many different sorts of phenomena. Learning to ride a bicycle is a very different experience from learning ancient Latin. And there seems to be very little in common in the experience of learning algebra and learning to play tennis. So, too, is Machine Learning very different from any sort of the myriad of different learning capabilities of a person.
I think we are in that same position today in regard to Machine Learning. The papers in conferences fall into two categories. One is mathematical results showing that yet another slight variation of a technique is optimal under some carefully constrained definition of optimality. A second type of paper takes a well know learning algorithm, and some new problem area, designs the mapping from the problem to a data representation e.
This would all be admirable if our Machine Learning ecosystem covered even a tiny portion of the capabilities of human learning. And, I see no alternate evidence of admirability. They have neither any understanding of how their tiny little narrow technical field fits into a bigger picture of intelligent systems, nor do they care. They think that the current little hype niche is all that matters, are blind to its limitations, and are uninterested in deeper questions.
I recommend reading Christopher Watkins Ph. It revitalized reinforcement learning by introducing Q-learning, and that is still having impact today, thirty years later. But more importantly most of the thesis is not about the particular algorithm or proofs about how well it works under some newly defined metric.
Instead, most of the thesis is an illuminating discussion about animal and human learning, and attempting to get lessons from there about how to design a new learning algorithm.
And then he does it. A Probabilistic Perspective, Kevin P. Murphy, MIT Press, Born in he was certainly the oldest person in the lab at that time. He was the principal author of the full screen editor a rarity at that time that we had, called Edit TV, or ET at the command level. He was still programming at age 85, and last logged in to the computer system when he was 88, a few months before he passed away. Watkins was unable to tell exactly from reading the paper.
H Surcombe, and D. Hobbs, Cambridge University Press, Many people have since built copies of MENACE both physically and in computer simulations, and all the ones that I have found on the web report matchboxes, virtual or otherwise.
Note that in total there are , different legal ways to play out a game of tic-tac-toe. If we consider only essentially different situations by eliminating rotational and reflective symmetries then that number drops to 31, I really appreciate your insightful discussion of machine learning.
I have always been interested in the problem of how humans learn to change the ontology with which they describe the world. Implementing this computationally, including on robots, is a good way to test the viability of such models and incidentally can create artifacts of great value. But for me, the interesting question is how to learner can go from one level of description of the world, to another in which both learning and problem-solving are vastly easier.
Some years ago, I wrote an essay on this, especially focused on spatial knowledge: You ground it in getting around and interacting with the real world, not in playing games. I found it far more admirable than this Christopher Watkins and his angle on reinforcement learning — although you are both inspired by animal behavior, he seems to take exactly the wrong parts from it, in a way that works much better for games than for the real world.
Many are focusing on very specialized techniques and feel more like technical reports than actual research. How a discovery was made, how it relates to other approaches in a qualitative sense, and how progress could be made, also structurally, is often just mentioned very briefly.
More often than not, a result is presented, without much deduction or context besides some hand-waving references to entire papers , and then some statistical validation or mathematical proof is presented. How an idea was obtained, the reasoning or inspiration behind it is not mentioned, eventhough it gives a better understanding of how to interpret the results, or would ease the transfer of that insight to other domains.
A lot of publications would benefit from presenting ideas from various points of views, and various abstraction levels. Most often, the only description provided is pretty technical, specialized, and low level. As such it often feels similar to reverse engineering programs written in assembly language, to extract higher level concepts and non-machine specific terms or concepts that are actually usable.
Even if a topic is pretty technical and specific in nature, it would be possible to lift it from its implementation or optimization details. Especially, providing a framework that allows for experimentation and validation to explore alternatives most experiments lack the amount of detail necessary, to be reproduced without filling major holes with guess work. Instead of describing a technical method with words, source code should be available, or some other formal executable form.
Your email address will not be published. Rodney Brooks Robots, AI, and other stuff. O Some board positions may not result in so many different looking positions when rotated or reflected.
How well do matchboxes learn? Alan would find it, simply by matching character for character, in the following part of the table for the first and second moves by MENACE: Summary of What Alan Must Do With these modifications we have made the job of Alan both incredibly simple and incredibly regimented. When Donald gives Alan a string of nine characters Alan looks it up in a table, noting the matchbox number and transform number.
He opens the numbered matchbox, randomly picks a bead from it and leaves it on the table in front of the open matchbox. He looks up the color of the bead in the numbered transform, to get a number between one and nine.
For L he removes the beads on the table and closes the open matchboxes. For D he adds one more bead of the same color to each one on the table, and puts the pairs in the matchboxes behind them, and closes the matchboxes.