AI Project 3: Robot learning

CS63 Spring 2004
Project 3: Robot learning
Due: Monday May 3, by noon

Introduction
Formulating a task
Reinforcement learning options
Pyro tips
Paper guidelines
Handing in your paper and programs

Introduction

For this project you will use some form of reinforcement learning to teach a robot to perform a task of your choice. As a first step you must determine an appropriate task given the available tools. You should assume you will be using the Player/Stage simulator to control a Pioneer robot through Pyro. Here are some of the capabilities that you could incorporate into your task:

Color: robots, walls, boxes, and pucks can be colored
Sonar sensors: provide range information to obstacles
Gripper service: provides access to gripper sensors and control
Blob camera service: provides access to 32 color channels with details on up to 10 blobs per channel
Truth service: can be attached to entities in the simulator, such as robots or pucks, and allows you to position an entity at a precise location and allows you to query an entity about its current location
Multiple robots: it is possible to separately control several robots within Player/Stage
Comm service: provides a simple mechanism for communicating between robots

If you plan on using one of the services, then experiment with it to be sure you understand its capabilities. There are examples in the Pyro Modules on the wiki.

Formulating a task

Create a detailed description of your task. How will you represent the input state? How will you represent the motor output? Since we will be using a neural network as the learning mechanism, you should think about how to scale the inputs and outputs to values between 0 and 1 or between -1 and 1 (you'll need to use a different activation function to do this).

In order to use a reinforcement learning method you will need to create a reinforcement procedure. Typically this procedure would take two states, the state prior to executing an action and the state that resulted from executing that action. It would then return a reinforcement value: negative for punishment, 0 for none, or positive for reward. If you plan on using a Genetic Algorithm to evolve the weights of a neural network, then the reinforcement values must always be positive. If you plan on using Qlearning with a neural network to learn expected values, then the reinforcement values must be between -1 and 1.

The frequency with which you provide a non-zero reinforcement value will determine how difficult the task is to learn. Delayed tasks, where reinforcement is only given at the time of goal achievement, are the hardest. Immediate tasks, where reinforcement is given at every time step, are the easiest. Intermediate tasks, where reinforcement is sporadic, are also possible. If you are using a Genetic Algorithm, then every task is essentially delayed because feedback from the fitness function is only given at the end of a task.

Reinforcement learning options

Use a Genetic Algorithm to evolve the weights of a neural network controller that takes states as input and produces motor commands as output.
Here is an example of how to accomplish this in Pyro. In this example the learning robot's task is to constantly seek a particular color while going as fast as possible and not getting stuck. The only object in the world that has the desired color is another robot that is being controlled by a simple behavior-based wander brain.
You'll need at least three programs to implement a GA-based reinforcement learning system:

A neural network brain for the learning robot: NNFindBlob.py
A genetic algorithm to evolve weights for this neural network brain: Evolve.py.
To begin evolving, execute the following command:
python Evolve.py

If it doesn't work, be sure that you have set the PYRO environment variable in your .bashrc file as shown below:
export PYRO=/usr/local/pyro

A program to test out the best weights found by the evolutionary process: TestResults.py
To observe a result, execute the following command on one of your saved weight files:
python TestResults.py results.wts

Notice in Evolve.py that at the end of every generation the weights of the best individual in the population are saved. Be sure to do this in your genetic algorithm as well. Then if your simulation is interrupted for some reason, you'll be able to re-seed a new population with the saved weights, and re-start evolution from that point rather than having to start from scratch. Also notice that you can use the flush() method on a file pointer to force it to write to the file immediately.
For this particular task, with a population size of 20, each generation takes approximately 12 minutes. To complete 100 generations requires about 20 hours. Therefore it is crucial that you get your simulation up and running as soon as possible so that you will have time to complete several runs of the genetic algorithm.
Use the Complementary Reinforcement Back-Propagation method to train a neural network controller that takes states as input and produces motor commands as output.
Use the Q-learning method to train a neural network that maps continuous states to the expected values of executing a discrete set of actions. This Q-learning network can then be used to control the robot by always selecting the action with the maximum Q value.

Pyro tips

Documentation on Player/Stage, look under the Stage manual to find out more detail on how to create world files.
If you are creating a unique Player/Stage world for your task, you'll need to copy several files from the following directory to your own directory:
```
/usr/local/pyro/plugins/worlds/Stage/
```
The files you'll need are:
```
usc_pioneer_gripper.inc
usc_pioneer.inc
pioneer.inc
```
And if you want to use one of the existing bitmaps, such as rink.pnm.gz you'll need to copy that as well. Once you have your world file and all of these files in the same directory you can test out your world by doing:
```
pyro -s Stage -w yourworld
```
To run Pyro from the command line you can do the following:
```
pyro -s simulator -w world -r robot -b brain 
```
For example:
```
pyro -s Stage -w tutorial.world -r Player2 -b BBWander.py
```
To save computation time during the learning process you can avoid bringing up the GUI by adding the following at the end of the command line:
```
-g tty
```
If you'd like to use command line arguments in a python program, you must first import sys. Then, the arguments will be available as a list of strings in sys.argv. Index zero of this list will contain the program name.
The Player/Stage software uses sockets to communicate between robots and the simulator. The Player1 and Player2 robots communicate on specific sockets. Therefore, if one student is running a simulation on a particular computer using these robots, no other student will be able to use these same robots on that computer.
Unfortunately, Player/Stage does not always shut down properly and may leave a socket open. To check if a socket is still open, ask a sysadmin to run the following command as root on that machine:
```
netstat --inet --tcp -lp 
```
If the socket is open and there is NOT an active pyro process running, then the socket can be closed by root by doing kill -9 PID where PID is the process id of that socket.

Paper guidelines

You will turn in a 4-6 page paper describing your project. Your paper should include the following:

A short abstract of 200 to 300 words summarizing your findings.
An introduction describing the robot learning task.
A detailed description of your experiments. There should be enough information so that someone could redo your experiments.
An explanation of the results. Use figures and tables where appropriate.
A discussion of the significance of the results.

Your grade will not be based on whether your experiment succeeds or fails. Negative results are as important as positive results. Your grade will be based solely on the thoroughness and readability of your paper.

Handing in your paper and programs

Hand in your paper at my office by Monday, May 3 at noon.
Also due at the same time, use cs63handin to turn in a tar file containing a sample of your most interesting results and the programs needed to load these results and observe the robot's behavior. Be sure to include a README file that explains how to execute your programs on these result files.

CS63 Spring 2004 Project 3: Robot learning Due: Monday May 3, by noon