Award

No Paper Title and Authors First Page of Paper
1
Emergent Bluffing and Inference with Monte Carlo Tree Search
Peter Cowling, Daniel Whitehouse and Edward Powley

Abstract—In many card and board games, players cannot see the whole game state, with different players seeing different parts of the state. In such games, gathering of information (inference) is a key strategic aspect, to which information hiding (bluffing, among other techniques) is an important countermeasure. Monte Carlo Tree Search (MCTS) is a powerful general-purpose technique for decision making in games. MCTS rose to prominence through successes in combinatorial board games such as Go, but more recently has demonstrated promise in card, board and video games of incomplete information. MCTS can construct robust plans in stochastic environments (making it strong in some games), but in its vanilla form is unable to infer or bluff (making it weak in games where this is a central feature).
In this paper, we augment MCTS with mechanisms for performing inference and bluffing. Like all algorithms based on game tree search, MCTS implicitly constructs a model of the opponents’ decision processes. We show that this model can be repurposed to perform an approximation of Bayesian inference. We also obtain bluffing behaviour by self-determinization (in- troducing “impossible” worlds into the agent’s pool of sampled states). We test our algorithms on The Resistance, a popular card game based around hidden roles. 
PDF
2 (Winner)
Towards Generating Arcade Game Rules with VGDL
Thorbjørn S. Nielsen, Gabriella A. B. Barros, Julian Togelius and Mark J. Nelson

Abstract—We describe an attempt to generate complete arcade games using the Video Game Description Language (VGDL) and the General Video Game Playing environment (GVG-AI). Games are generated by an evolutionary algorithm working on genotypes represented as VGDL descriptions. In order to direct evolution towards good games, we need an evaluation function that accurately estimates game quality. The evaluation function used here is based on the differential performance of several game-playing algorithms, or Relative Algorithm Performance Profiles (RAPP): it is assumed that good games allow good players to play better than bad players. For the purpose of such evaluations, we introduce two new game tree search algorithms, DeepSearch and Explorer; these perform very well on benchmark games and constitute a substantial subsidiary contribution of the paper. In the end, the attempt to generate arcade games is only partially successful, as some of the games have interesting design features but are barely playable as generated. An analysis of these shortcomings yields several suggestions to guide future attempts at arcade game generation. 
PDF
3
EnHiC: An Enforced Hill Climbing Based System for General Game Playing
Amin Babadi, Behnaz Omoomi and Graham Kendall

Abstract—Accurate decision making in games has always been a very complex and yet interesting problem in Artificial Intelligence (AI). General video game playing (GVGP) is a new branch of AI whose target is to design agents that are able to win in every unknown game environment by choosing wise decisions. This paper proposes a new search methodology based on enforced hill climbing for using in GVGP and we evaluate its performance on the benchmarks of the general video game AI competition (GVG-AI). Also a simple and efficient heuristic function for GVGP is proposed. The results show that EnHiC outperforms several well-known and successful methods in the GVG-AI competition. 
PDF
4
Belief-State Monte-Carlo Tree Search for Phantom Games
Jiao Wang, Tan Zhu, Hongye Li, Chu-Hsuan Hsueh and I-Chen Wu

Abstract—Playing games with imperfect information is a very challenging issue in AI field due to its high complexity. Phantom game is a kind of such games, which usually has a large game- tree complexity and has little research achievements until now. In Phantom games, rational human players commonly select actions according to their beliefs in the game, which can be represented as a concept of belief-state. To the best of our knowledge, our paper is the first article to incorporate belief-states in the Monte-Carlo Tree Search, and the proposed algorithm is named BS-MCTS (Belief-state Monte-Carlo Tree Search). In BS-MCTS, a belief- state tree, in which each node is a belief-state, is constructed and the search procedure is in accordance with beliefs updated by heuristic search information. We also present two novel imple- mentations in the belief learning, that are Opponent Guessing and Opponent Predicting, concerning the probability on the possible states and on future actions of the opponent respectively. To prove the effectiveness of our algorithm, BS-MCTS is applied to Phantom Tic-Tac-Toe and Phantom Go against other Monte- Carlo methods. The experimental results demonstrate that our method is outstanding and advanced. Moreover, based on BS- MCTS, our Phantom Go program had consecutively won three championships in Chinese National Tournaments. 
PDF
5
Building a Computer Mahjong Player Based on Monte Carlo Simulation and Opponent Models
Naoki Mizukami and Yoshimasa Tsuruoka

Abstract—Predicting opponents’ moves and hidden states is important in imperfect information games. This paper describes a method for building a Mahjong program that models opponent players and performs Monte Carlo simulation with the models. We decompose an opponent’s play into three elements, namely, waiting, winning tiles, and winning scores, and train prediction models for those elements using game records of expert human players. Opponents’ moves in the Monte Carlo simulations are determined based on the probability distributions of the opponent models. We have evaluated the playing strength of the resulting program on a popular online Mahjong site “Tenhou”. The program has achieved a rating of 1718, which is significantly higher than that of the average human player. 
PDF
No Paper Title and Authors First Page of Paper
1
Regulating of Exploration for Simple Regret Minimization in Monte-Carlo Tree Search
Yun-Ching Liu and Yoshimasa Tsuruoka

Abstract—The application of multi-armed bandit (MAB) algo- rithms was a critical step in the development of Monte-Carlo tree search (MCTS). One example would be the UCT algorithm, which applies the UCB bandit algorithm. Various research has been conducted on applying other bandit algorithms to MCTS. Simple regret bandit algorithms, which aim to identify the optimal arm after a number of trials, have been of great interest in various fields in recent years. However, the simple regret bandit algorithm has the tendency to spend more time on sampling suboptimal arms, which may be a problem in the context of game tree search. In this research, we will propose combined confidence bounds, which utilize the characteristics of the confidence bounds of the improved UCB and UCB· algorithms to regulate exploration for simple regret minimization in MCTS. We will demonstrate the combined confidence bounds bandit algorithm has better empirical performance than that of the UCB algorithm on the MAB problem. We will show that the combined confidence bounds MCTS (CCB-MCTS) has better performance over plain UCT on the game of 9 × 9 Go, and has shown good scalability. We will also show that the performance of CCB-MCTS can be further enhanced with the application of all-moves-as-first (AMAF) heuristic. 
PDF
2
An Experimental Study of Action Selection Mechanisms to Create an Entertaining Opponent
Nick Sephton, Peter Cowling and Nicholas Slaven

Abstract—The final step in the Monte Carlo Tree Search algorithm is to select the action to play from the root level of the tree. Experimentation on modifying the selection mechanism has been somewhat limited to date, particularly with respect to consider aspects other than playing strength. This paper investigates the modification of selection mechanism as an attempt to produce a more entertaining opponent in the strate- gic card game Lords of War. These selection mechanisms are played against our most effective Information Set MCTS agent, and we investigate their performance in terms of measures of performance and complexity. An interesting side effect is that one of the action selection mechanisms results in a significant improvement in ISMCTS play strength. We also experiment with online tuning of configuration parameters in an attempt to create an agent with dynamically scaling play strength. 
PDF
3
An Improved Approach to Reinforcement Learning in Computer Go
Michael Dann, Fabio Zambetta and John Thangarajah

Abstract—Monte-Carlo Tree Search (MCTS) has revolu- tionised Computer Go, with programs based on the algorithm achieving a level of play that previously seemed decades away. However, since the technique involves constructing a search tree its performance tends to degrade in larger state spaces. Dyna-2 is a hybrid approach that attempts to overcome this shortcoming by combining Monte-Carlo methods with state abstraction. While not competitive with the strongest MCTS-based programs, the Dyna-2-based program RLGO achieved the highest ever rating by a traditional program on the 9×9 Computer Go Server. Plain Dyna-2 uses ε-greedy exploration and a flat learning rate, but we show that the performance of the algorithm can be significantly improved by making some relatively minor adjustments to this configuration. Our strongest modified program achieved an Elo rating 289 points higher than the original in head-to-head play, equivalent to an expected win rate of 84%. 

PDF
4 (Winner)
Convergence and Correctness Analysis of Monte-Carlo Tree search Algorithms: A Case Study of 2 by 4 Chinese Dark Chess
Hung-Jui Chang, Chih-Wen Hsueh and Tsan-Sheng Hsu

Abstract—The convergence and correctness of the UCT algo- rithm is a hot research problem. Previous research has shown the convergence of UCT-based algorithm on simultaneous turns or random turns games, but little is known for traditional alternating turns games. In this paper, we analyze the performance of the UCT algorithm in a 2-player imperfect information alternating turns game, 2 × 4 Chinese dark chess (CDC). The performance of the UCT algorithm is measured by the correctness rate and convergence speed. The correctness is defined by the percentage that the UCT algorithm outputs the same move with the one having the best game theoretic value. The convergence speed is defined by the entropy of a set of moves, which are output by the UCT algorithm using the same number of iterations. Experimental result shows the convergence of the UCT algorithm in the CDC, which can also be applied to explain the existence of diminishing returns in the UCT algorithm. Experimental result also shows an UCT algorithm does not always output moves with the best game theoretic value, but these with the highest chance of winning. 
PDF