The starting board for an 8x8 game of Hex.
- - - - - - - - \ · · · · · · · · \ \ · · · · · · · · \ \ · · · · · · · · \ \ · · · · · · · · \ \ · · · · · · · · \ \ · · · · · · · · \ \ · · · · · · · · \ \ · · · · · · · · \ - - - - - - - -
A point in the game after each player has made four moves.
- - - - - - - - \ · · · · · · ● ● \ \ · · · · · · ● ● \ \ · · · · · · ● · \ \ · · · · · ● · · \ \ · · · · · · · · \ \ · ● · · · · · · \ \ · · · ● · · · · \ \ · · · · · · · · \ - - - - - - - -
Blue has just won the game by creating a connected path between the two blue sides of the Hex board.
- - - - - - - - \ · · · · · · ● ● \ \ · · · · · · ● ● \ \ · · · · · · ● ● \ \ · · · · · ● ● ● \ \ · · · ● ● ● ● · \ \ ● ● ● ● ● · ● · \ \ ● ● ● ● · · · · \ \ · · · · · · · · \ - - - - - - - -
General game players are systems able to accept descriptions of arbitrary games at run time and able to use such descriptions to play those games effectively without human intervention. In other words, they do not know the rules until the games start. Unlike specialized game players, general game players cannot rely on algorithms designed in advance for specific games.
As in the previous lab, use Teammaker to form your team. You can log in to that site to indicate your partner preference. Once you and your partner have specified each other and the lab has been released, a GitHub repository will be created for your team.
The objective of this lab is to use Monte Carlo Tree Search to implement a general game playing agent.
The primary Python file you will be modifying is MonteCarloTreeSearch.py. You have also been provided several files that should look familiar from lab 3:
There are also several new python programs:
Just like last week, all games are played using PlayGame.py. The interface has been updated slightly; you can use the -h option for more information.
To get started, try playing hex against your partner. This game has a large branching factor, so you'll likely have to scroll up to see the game board between turns. The game is played on a grid where (0,0) is a the top left corner and (7,7) is the bottom right corner.
./PlayGame.py hex human human
To begin, open the file MonteCarloTreeSearch.py and complete the implementations of the two classes provided.
value = 1 + (wins-losses)/visitsThe benefits of this formula are that:
Here's pseudocode that fleshes out these steps:
getMove(game_state) # Find or create node for game_state key = str(game_state) if key in tree curr_node = get node from tree using key else curr_node = create new node with game_state add curr_node to tree using key # Perform Monte Carlo Tree Search from that node MCTS(curr_node) # Determine the best move from that node bestValue = -float("inf"); bestMove = None for move, child_node in curr_node's children if curr_node's player is +1 value = child_node.value else value = 2 - child_node. value if value > bestValue update bestValue and bestMove return bestMove
./PlayGame.py nim mcts randomHere's an example status that might be printed for the root node after the first turn:
node wins 988, losses 12, visits 1000, value 1.98 child wins 0, losses 2, visits 2, value 0.00, move 3 child wins 7, losses 4, visits 11, value 1.27, move 1 child wins 981, losses 7, visits 988, value 1.99, move 2
Notice that the best move based on the rollouts is to take 2, which puts our opponent at 5 pieces. We saw in class that, with optimal strategy, playing from 5 pieces is a guaranteed loss. MCTS has also discovered this via the rollouts.
Look at numbers of wins, losses, and visits. You would expect that if we were to sum up these values at the child nodes that they would equal the total at the root node.
What is going on? Remember that MCTS is storing the tree in a dictionary that maps states to nodes. The states are represented by the number of pieces remaining and whose turn it is. From the starting state of (7, turn 1), we can get to three successor states: (6, turn -1), (5, turn -1), and (4, turn -1). It turns out that there is another way to get to this last successor state (4, turn -1), by the max player taking 1, the min player taking 1, and the max player taking 1 again. This series of moves has a low value so is rarely tried in rollouts, but because of the UCB formula, it typically does get explored at least one time out of the many rollouts that were done. And this is why the number of losses and visits is off from our expectations.
Each rollout:
Pseudocode for MCTS is provided below:
MCTS(current_node) repeat num_rollout times path = selection(current_node) selected_node = final node in path if selected_node is terminal outcome = winner of selected_node's state else next_node = expansion(selected_node) add next_node to end of path outcome = simulation(next_node's state) backpropagation(path, outcome) status(current_node) # use for debuggingYou will certainly want to break this task down using several helper methods, at least one for each phase of the algorithm.
./PlayGame.py nim mcts random -game_args 7 -a1 100
Your output should look similar to the following, though the numbers will not be exactly the same due to the randomness of the rollouts, the trends should be similar. Player 1 (MCTS) should win every time.
Nim: 7 Turn: 1 completed 100 rollouts root wins 90, losses 10, visits 100, value 1.80 child wins 1, losses 2, visits 3, value 0.67, move 3 child wins 86, losses 5, visits 91, value 1.89, move 2 child wins 4, losses 3, visits 7, value 1.14, move 1 Move: 2 Nim: 5 Turn: -1 Move: 1 Nim: 4 Turn: 1 completed 100 rollouts root wins 131, losses 2, visits 133, value 1.97 child wins 182, losses 0, visits 182, value 2.00, move 3 child wins 1, losses 2, visits 3, value 0.67, move 2 child wins 0, losses 2, visits 2, value 0.00, move 1 Move: 3 Nim: 1 Turn: -1 Move: 1 Nim: 0 Turn: 1 player 1 (MCTS) winsOnce you are confident that MCTS is working properly you can turn off the status messages and explore how MCTS does with the much hard game Hex. Note that in Hex, Player 1 is blue and Player 2 is red. Try a game vs a random opponent using 1000 rollouts per turn (note there will be a clear pause in play as the MCTS completes these rollouts). Player 1 (MCTS) should win every time.
./PlayGame.py hex mcts random -a1 1000Try games with two MCTS opponents pitted against one another. Give one version only 10 rollouts and the other version 1000 rollouts. The MCTS with more rollouts should always defeat the one with less rollouts.
./PlayGame.py hex mcts mcts -a1 1000 -a2 10You should ensure that the MCTS with more rollouts is successful as either Player 1 or Player 2.
./PlayGame.py hex mcts mcts -a1 10 -a2 1000Once you are confident that MCTS is working properly try playing Hex against it.
./PlayGame.py hex mcts human -a1 1000Can you beat it? Does it seem to have good strategies? Does it's play improve if you give it more rollouts, say 2000 per turn?
For any extension that you try, describe what you did and the outcome in the file called extensions.md.