Keywords: Board Games, Reinforcement Learning, TD (λ), Self-play
II. INTRODUCTION
Artificial Intelligence approaches can be …show more content…
Fogel. The purpose was to determine the effectiveness of an artificial intelligence checkers-playing computer program. Blondie24 played against some 165 human opponents and was shown to achieve a rating of 2048, or better than 99.61% of the playing population of that web site. The design of Blondie24 is based on a minimax algorithm of the checkers game tree in which the evaluation function is an artificial neural network. The neural net receives as input a vector representation of the checkerboard positions and returns a single value which is passed on to the minimax algorithm. The weights of the neural network were obtained by an evolutionary algorithm (an approach now called neuroevolution). In this case, a population of Blondie24-like programs played each other in checkers, and those were eliminated that performed relatively poorly. Performance was measured by a points system: Each program earned one point for a win, none for a draw, and two points were subtracted for a loss. After the poor programs were eliminated, the process was repeated with a new population derived from the winners. In this way, the result was an evolutionary process that selected programs that played better checkers games. The significance of the Blondie24 program is that its ability to play checkers did not rely on any human expertise of the game. Rather, it came …show more content…
Although TD-Gammon has greatly surpassed all previous computer programs in its ability to play backgammon, that was not why it was developed. Rather, its purpose was to explore some exciting new ideas and approaches to traditional problems in the field of reinforcement learning. Each turn while playing a game, TD-Gammon examines all possible legal moves and all their possible responses (two-ply look-ahead), feeds each resulting board position into its evaluation function, and chooses the move that leads to the board position that got the highest score. In this respect, TD-Gammon is no different than almost any other computer board-game program. TD-Gammon's innovation was in how it learned its evaluation function.TD-Gammon's learning algorithm consists of updating the weights in its neural net after each turn to reduce the difference between its evaluation of previous turns' board positions and its evaluation of the present turn's board position—hence "temporal-difference learning". TD-Gammon was designed as a way to explore the capability of multilayer neural networks trained by TD(lambda) to learn complex nonlinear functions. It was also designed to provide a detailed comparison of the TD learning approach with the alternative approach of supervised training on a corpus of expert-labelled