Lab 4: General Game Playing
Due Feb. 29 by midnight
General game players are systems able to accept descriptions of arbitrary games at runtime and able to use such descriptions to play those games effectively without human intervention.
In other words, they do not know the rules until the games start.
Unlike specialized game players, general game players cannot rely on algorithms designed in advance for specific games.
Starting point code
You are encouraged to work with a partner on this lab. Please read the
following instructions carefully.
- First you need to run setup63 to create a git
repository for the lab.
If you want to work alone do:
setup63-Lisa labs/04 none
If you want to work with a partner, then one of you needs to
run the following command while the other one waits until it finishes:
setup63-Lisa labs/04 partnerUsername
Once the script finishes, the other partner should run it on their account.
- For the next step, only one partner should copy over the
starting point code:
cd ~/cs63/labs/04
cp -r /home/meeden/public/cs63/labs/04/* .
- Whether you are working alone or with a partner, you should now
add, commit, and push these files as shown below:
git add *.py
git add -f *.pyc
git commit -m "lab4 start"
git push
- If you are working with a partner, your partner can now
pull the changes in:
cd ~/cs63/labs/04
git pull
You are now ready to start the lab.
Introduction
In this lab you will use Monte Carlo tree search to implement a general game playing agent.
The only Python file you will need to modify is MCTS.py.
You have also been provided the following files:
- game0.py, which implements 6x6 hex
- game1.pyc, which implements 6x8 connect four
- game2.pyc, which implements a mystery game
All three games use the same interface, explained below.
Your agent will also be tested on a fourth game not included in the starter code.
Your code will read and write state table files to save its search progress; these files are also explained below.
To get started, try running MCTS on one of the games.
The first argument specifies a game number (0, 1, or 2), while the second argument specifies the number of rollouts MCTS should perform for each decision.
Since we haven't implemented MCTS yet, let's set this to zero.
./MCTS 0 0
This will play a game between your MCTS agent and an agent that moves randomly.
Right now, your MCTS agent is also moving randomly, so the results may not be very exciting.
General Game Interface
Your code is allowed to interact with the various games via the following methods and fields:
- newGame(): returns a State object
- State.getMoves(): returns a collection of moves
- State.nextState(move): returns a State
- State.isTerminal(): returns a boolean
- State.value(): returns +1, 0, or -1 for a player 1 win, draw, or loss; only valid for terminal states
- State.turn: +1 for the 1st player; -1 for the 2nd
- State.key: a hashable string representation of the state
Each game also has some fields and/or methods whose names begin with an underscore (for example _getNeighbors() and State._board in game0.py).
You are not permitted to directly access these.
Implementation Tasks
- Node class: You will need to implement
the Node.UCBWeight() method used in node selection and
the Node.updateValue() method used for value backpropagation.
It is up to you how to define the value of a node; the only
requirements are that value increase when the win rate increases, and
that the value never causes UCBWeight() to return a negative
value. You may also wish to store some extra information in
the Node class to facilitate these computations.
- MCTS: Your primary implementation task is to code the
Monte Carlo tree search algorithm. The MCTS() function takes
a node from which to start the search, and a number of rollouts to
perform. Each rollout navigates explored nodes via UCB sampling until
it reaches the frontier, then expands one new node, then performs a
random playout to a terminal state, then propagates the outcome back
to expanded nodes along the path. When all rollouts are complete, the
function deterministically returns the best move available from its
start node. Pseudocode for MCTS is provided in the next section. You
will certainly want to break this task down using several helper
functions.
- Saving the search tree: To improve search performance
over time, we want to save a portion of the search tree after each
game we play. We will do this by storing a state table dictionary
that maps a game state's key to a (visits, value) pair. The starter
code takes care of reading and writing JSON files that store the state
table between runs. Your task is to translate this state table into a
partial search tree as a starting point for MCTS, and to convert your
search tree back into a state table after each run.
The build_table() function creates a state table dictionary
that can be converted to JSON format. Not all states in the search
tree will be part of the state table: only those reachable from the
root node within opening_depth moves.
The build_tree() function creates a Node object for
the start state, and looks up the start state's visits and value in
the state table. It then iteratively constructs nodes for all
reachable states in the state table by generating possible next states
and checking for their keys.
Monte Carlo Tree Search
Each time it is your agent's turn to move, the main function
will call MCTS(), passing in a node from which to start the
search, and a number of rollouts to perform. Your MCTS()
function should implement the following pseudocode. Remember
that root in this pseudocode refers to the current position
you are in the game, not the overall root of the tree.
function MCTS(root, rollouts)
for the number of rollouts
node = root
# Selection
while node not terminal and all children of node expanded
node = UCB_sample(node)
# Expansion
if node not terminal
node = expand a randomly selected unexpanded child
# Simulation
outcome = random_playout(node.state)
# Backpropagation
backpropagation(node, root, outcome)
return move that generates the highest-value successor of root
(from the current player's perspective)
function UCB_sample(node)
weights = [UCBWeight(child) for each child of node]
distribution = normalize(weights)
return one child chosen as a random sample from distribution
function random_playout(state)
while state is not terminal
state = random successor of state
return winner
function backpropagation(node, root, outcome):
while node != root
update visited of node
call update_value on node with outcome
node = parent of node
call update_value on the root with outcome
UCB_weight is calculated according the formula in the node selection section of mcts.ai.
Extensions
When you have completed the above implementation tasks, you are encouraged to explore some of the following extensions:
- Play your agent against itself.
Replace the randomMove opponent with one of the following variants of your MCTS agent.
- Try varying the amount of amount of prior knowledge with which the search is seeded by setting the depth parameters.
- Try varying the number of rollouts.
- Try varying the UCB_CONST parameter to trade off exploration vs exploitation.
How does the optimal UCB_CONST value depend on the game?
How does it depend on the amount of prior knowledge?
The number of rollouts?
- Try a different value estimate; how does this change interact with changing UCB_CONST?
- Can you determine what mystery game is implemented by game2.py?
- Implement your own game using the GGP interface and try playing it with your MCTS agent.
Submitting your code
Before submitting, ensure that you have added your name(s) to the
top-level comments. To submit your code, you need to use git to add,
commit, and push the files that you modified.
cd ~/cs63/labs/04
git add MCTS.py
git commit -m "final version"
git push