Lab 4: General Game Playing

Lab 4: General Game Playing
Due Feb. 29 by midnight

General game players are systems able to accept descriptions of arbitrary games at runtime and able to use such descriptions to play those games effectively without human intervention. In other words, they do not know the rules until the games start. Unlike specialized game players, general game players cannot rely on algorithms designed in advance for specific games.

Starting point code

You are encouraged to work with a partner on this lab. Please read the following instructions carefully.

First you need to run setup63 to create a git repository for the lab.
If you want to work alone do:
```
setup63-Lisa labs/04 none
```
If you want to work with a partner, then one of you needs to run the following command while the other one waits until it finishes:
```
setup63-Lisa labs/04 partnerUsername
```
Once the script finishes, the other partner should run it on their account.
For the next step, only one partner should copy over the starting point code:
```
cd ~/cs63/labs/04
cp -r /home/meeden/public/cs63/labs/04/* .
```
Whether you are working alone or with a partner, you should now add, commit, and push these files as shown below:
```
git add *.py
git add -f *.pyc
git commit -m "lab4 start"
git push
```
If you are working with a partner, your partner can now pull the changes in:
```
cd ~/cs63/labs/04
git pull
```

You are now ready to start the lab.

Introduction

In this lab you will use Monte Carlo tree search to implement a general game playing agent. The only Python file you will need to modify is MCTS.py. You have also been provided the following files:

game0.py, which implements 6x6 hex
game1.pyc, which implements 6x8 connect four
game2.pyc, which implements a mystery game

All three games use the same interface, explained below. Your agent will also be tested on a fourth game not included in the starter code. Your code will read and write state table files to save its search progress; these files are also explained below.

To get started, try running MCTS on one of the games. The first argument specifies a game number (0, 1, or 2), while the second argument specifies the number of rollouts MCTS should perform for each decision. Since we haven't implemented MCTS yet, let's set this to zero.

./MCTS 0 0

This will play a game between your MCTS agent and an agent that moves randomly. Right now, your MCTS agent is also moving randomly, so the results may not be very exciting.

General Game Interface

Your code is allowed to interact with the various games via the following methods and fields:

newGame(): returns a State object
State.getMoves(): returns a collection of moves
State.nextState(move): returns a State
State.isTerminal(): returns a boolean
State.value(): returns +1, 0, or -1 for a player 1 win, draw, or loss; only valid for terminal states
State.turn: +1 for the 1st player; -1 for the 2nd
State.key: a hashable string representation of the state

Each game also has some fields and/or methods whose names begin with an underscore (for example _getNeighbors() and State._board in game0.py). You are not permitted to directly access these.

Implementation Tasks

Node class: You will need to implement the Node.UCBWeight() method used in node selection and the Node.updateValue() method used for value backpropagation. It is up to you how to define the value of a node; the only requirements are that value increase when the win rate increases, and that the value never causes UCBWeight() to return a negative value. You may also wish to store some extra information in the Node class to facilitate these computations.
MCTS: Your primary implementation task is to code the Monte Carlo tree search algorithm. The MCTS() function takes a node from which to start the search, and a number of rollouts to perform. Each rollout navigates explored nodes via UCB sampling until it reaches the frontier, then expands one new node, then performs a random playout to a terminal state, then propagates the outcome back to expanded nodes along the path. When all rollouts are complete, the function deterministically returns the best move available from its start node. Pseudocode for MCTS is provided in the next section. You will certainly want to break this task down using several helper functions.
Saving the search tree: To improve search performance over time, we want to save a portion of the search tree after each game we play. We will do this by storing a state table dictionary that maps a game state's key to a (visits, value) pair. The starter code takes care of reading and writing JSON files that store the state table between runs. Your task is to translate this state table into a partial search tree as a starting point for MCTS, and to convert your search tree back into a state table after each run.
The build_table() function creates a state table dictionary that can be converted to JSON format. Not all states in the search tree will be part of the state table: only those reachable from the root node within opening_depth moves.
The build_tree() function creates a Node object for the start state, and looks up the start state's visits and value in the state table. It then iteratively constructs nodes for all reachable states in the state table by generating possible next states and checking for their keys.

Monte Carlo Tree Search

Each time it is your agent's turn to move, the main function will call MCTS(), passing in a node from which to start the search, and a number of rollouts to perform. Your MCTS() function should implement the following pseudocode. Remember that root in this pseudocode refers to the current position you are in the game, not the overall root of the tree.

function MCTS(root, rollouts)
    for the number of rollouts
        node = root
        # Selection
        while node not terminal and all children of node expanded
            node = UCB_sample(node)
        # Expansion
	if node not terminal
           node = expand a randomly selected unexpanded child
	# Simulation
        outcome = random_playout(node.state)
	# Backpropagation
        backpropagation(node, root, outcome)
    return move that generates the highest-value successor of root
           (from the current player's perspective)

function UCB_sample(node)
    weights = [UCBWeight(child) for each child of node]
    distribution = normalize(weights)
    return one child chosen as a random sample from distribution

function random_playout(state)
    while state is not terminal
        state = random successor of state
    return winner

function backpropagation(node, root, outcome):
    while node != root
	update visited of node
        call update_value on node with outcome
        node = parent of node
    call update_value on the root with outcome

UCB_weight is calculated according the formula in the node selection section of mcts.ai.

Extensions

When you have completed the above implementation tasks, you are encouraged to explore some of the following extensions:

Play your agent against itself. Replace the randomMove opponent with one of the following variants of your MCTS agent.
Try varying the amount of amount of prior knowledge with which the search is seeded by setting the depth parameters.
Try varying the number of rollouts.
Try varying the UCB_CONST parameter to trade off exploration vs exploitation. How does the optimal UCB_CONST value depend on the game? How does it depend on the amount of prior knowledge? The number of rollouts?
Try a different value estimate; how does this change interact with changing UCB_CONST?
Can you determine what mystery game is implemented by game2.py?
Implement your own game using the GGP interface and try playing it with your MCTS agent.

Submitting your code

Before submitting, ensure that you have added your name(s) to the top-level comments. To submit your code, you need to use git to add, commit, and push the files that you modified.

cd ~/cs63/labs/04
git add MCTS.py
git commit -m "final version"
git push