In my youth, as a financial analyst with IBM, a bunch of us financial guys would get together from time to time to play a friendly game of poker. We rotated from house to house, playing from 9:00 pm until about 3:00 in the morning. These were great games. What was really interesting was how financial guys who were used to moving millions of dollars from one column to another on a piece of paper during the working day, changed their attitudes toward money when it came to their own twenty-five cents at night. Among other observations was the strong correlation between how the guys played the game of poker and how well they ultimately did in their careers as financial analysts, and later, as financial executives. Studying that correlation would be an interesting piece of research on its own.
Some of you may know of a famous book The Education of a Poker Player by Herbert O. Yardley. He was a code-breaker working for the United States between the first and second world wars. He also wrote The American Black Chamber about those experiences. (Both books are available on Amazon.) It was from his Poker book that I learned that the best way to play poker is to play it tight. Bet only good hands and dont bluff - hardly. That sums up the game pretty well, and it worked for me. That book, plus my experience playing the game, made me curious to know if it would be possible to conduct a computer experiment to see if the computer could also learn how to play the game. I also wanted to find support, or not, for Yardleys theory. To do this it would be necessary to create a simulation of a poker game, in which various betting strategies would compete and be tested against each other.
I had an IBM Personal Computer with a BASIC compiler available to me. The first step was to develop a set of computer algorithms that could interpret the value of a poker hand. In our guy-games we played seven-card stud, so I built the simulation on that game. The betting starts with the third card, so a player is betting before the hand is complete. That meant that the computer had to assign some value to a partial handa pair or three-of-a-kind had obvious value, but so did a three or four card straight or flush. When the last card was dealt, the best five cards were picked and the hand rankings were those that follow the rules of the game.
The next step was to simulate a game, that is, seven players, one hand of seven cards for each player with four rounds of betting, and a winner. The players would be dealt their cards and they would place their bets. Players could bet, fold, call, or raise. It was a limit game, and up to three raises were allowed for each round of betting. When a hand was over, the winner would get the chips added to his holdings. An evening of play would consist of 40 hands.
Then we wanted to simulate a tournament. For this we created a pool of 32 players. Seven players would be randomly selected for an evening of play. There were about 30 evenings of play in a tournament. After each evening, a new set of players would be selected for the next evening. In a tournament, all 32 players would have the chance to play several evenings.
Each of the 32 players was given a strategy of playthat is, a set of rules to determine under what circumstances the player would bet, fold, call, or raise. Now this strategy was like a little computer program. The betting was based on the quality of the players hand plus knowledge of the betting taking place during the game. it was not nearly as complicated as what a real player would bring into the game. For example, a player did not see or take into account the other cards which would have been showing on the table. Nor did any player have a memory of how other players had played in the past. These would be important factors in a real game. However, the strategy program that was used was multidimensional and complex enough to provide for a wide variation in betting styles.
Now, the purpose of this experiment was to see if players can, with experience, learn to become better poker players But we did not allow the players to change their strategies, or learn from their own experiences. Instead, we added a new kind of learningevolutionary learning.
After a tournament was over, one player
would emerge as the best of the 32, and another player would
be revealed as the worst. In the pool of 32 players, the best
player would remain to play in future tournaments, but the worst
player would be dropped. A new player would be created, one that
had a slightly, randomly modified strategy based on the best
player. The new player was not a clone of best, but more akin
to a child, that is, one with similar but not identical characteristics
of the parent.
You might suspect that running through a bunch of tournaments took a lot of computer time, and you would be right. I set the computer to running on Friday afternoon, let it run over a weekend, and would check it when I got back to the office on Monday morning. The computer could complete several hundred tournaments over that time period.
This is what I found out.
By Monday the set of 32 players had been completely replaced by new players, and they played a very tight game, as Yardley would have predicted. It substantiated his thesis that a tight game is a good game. But wait, theres more.
One Monday morning I was utterly surprised and amazed by an unexpected result. Instead of finding all of the players playing a tight game, as had happened in the past, the players had bifurcated into two distinct camps. One group played the tight game, but surprisingly, another group survived by playing in an utterly bold and reckless manner. These players would bet and raise on virtually nothing.
Now, I kept records of player lineagewhich players were related to previous playersand I could see them being removed and been replaced as the tournaments followed, one after the other. What happened was that over time, two separate and distinct species of players began to dominate the others.
Now we are seeing an example of learning, not of individuals, but of a species. Over time, the species evolved to better live in its environment, which in this case consists of playing poker. However, with no outside influence, the species divided into two different groups. Although there is not enough evidence from the game to prove this, it appeared to me that the two types of players were able to gang up on the others, and wipe them out. In other words, the tight player and the loose player in the game were not so much in competition, but rather in a symbiotic relationship with each other.
This experiment differs from our previous ones in a couple of important ways. Previously we talked about three conditions that must exist for a creative process: A goal, a way of creating options, and a way of measuring the value of those options against the goal. This has not changed, but here we have introduced a different concept of what a goal should be. Instead of the goal being a single measure of success or achievement, it is a goal of survival. The goal is for one solution to be better than another solution and therefore to be able to survive in a hostile environment.
The results also illustrated that there may be several, possibly numerous and significantly different ways to achieve the goal, just as different species can survive together in the same environment. For each species, the presence of other species represent a part of that environment. The results also suggest an explanation for the observed fact that there is sometimes more competition within species than between them.
Another difference between this and previous experiments was that success or failure of a player was influenced by both the player, skill (the quality of his strategy) and by chance. Bad players can win and good players can lose. I believe this factor was important in permitting species bifurcation. The luck factor allows a less effective strategy to continue to exist and to evolve in its own direction without being wiped out right away.
What have I done with this simulation since? Sorry to say, not much. I have wanted to continue the experiment to build into it player strategies that are more realistic, including the ability to see and respond to competitive players cards and betting patterns. The players strategy is a computer program, of course, so for our purpose we need an evolvable computer language that can represent a strategy. But is there a computer language that can be modified randomly and still function? The languages we have created use parameters to represent functions, but within a very restricted framework. The parameters are modified randomly but the basic structure is not. Possibly some kind of parameterized object-oriented language yet to to be created? Perhaps something designed along the lines of of DNA? There is work yet to be done and I am still looking.
Next month: What are we missing? The Human Factor
Richard Ten Dyke, a member of the Danbury Area Computer Society, has previously contributed to this newsletter on the topic of Digital Photography. He is retired from IBM and can be reached at email@example.com. All opinions are his own, and he welcomes comments.