For this project you will use some form of reinforcement learning to teach a robot to perform a task of your choice. As a first step you must determine an appropriate task given the available tools. You should assume you will be using the Player/Stage simulator to control a Pioneer robot through Pyro. Here are some of the capabilities that you could incorporate into your task:
In order to use a reinforcement learning method you will need to create a reinforcement procedure. Typically this procedure would take two states, the state prior to executing an action and the state that resulted from executing that action. It would then return a reinforcement value: negative for punishment, 0 for none, or positive for reward. If you plan on using a Genetic Algorithm to evolve the weights of a neural network, then the reinforcement values must always be positive. If you plan on using Qlearning with a neural network to learn expected values, then the reinforcement values must be between -1 and 1.
The frequency with which you provide a non-zero reinforcement value will determine how difficult the task is to learn. Delayed tasks, where reinforcement is only given at the time of goal achievement, are the hardest. Immediate tasks, where reinforcement is given at every time step, are the easiest. Intermediate tasks, where reinforcement is sporadic, are also possible. If you are using a Genetic Algorithm, then every task is essentially delayed because feedback from the fitness function is only given at the end of a task.
Here is an example of how to accomplish this in Pyro. In this example the learning robot's task is to constantly seek a particular color while going as fast as possible and not getting stuck. The only object in the world that has the desired color is another robot that is being controlled by a simple behavior-based wander brain.
You'll need at least three programs to implement a GA-based reinforcement learning system:
python Evolve.py
If it doesn't work, be sure that you have set the PYRO environment variable in your .bashrc file as shown below:
export PYRO=/usr/local/pyro
python TestResults.py results.wts
Notice in Evolve.py that at the end of every generation the weights of the best individual in the population are saved. Be sure to do this in your genetic algorithm as well. Then if your simulation is interrupted for some reason, you'll be able to re-seed a new population with the saved weights, and re-start evolution from that point rather than having to start from scratch. Also notice that you can use the flush() method on a file pointer to force it to write to the file immediately.
For this particular task, with a population size of 20, each generation takes approximately 12 minutes. To complete 100 generations requires about 20 hours. Therefore it is crucial that you get your simulation up and running as soon as possible so that you will have time to complete several runs of the genetic algorithm.
/usr/local/pyro/plugins/worlds/Stage/The files you'll need are:
usc_pioneer_gripper.inc usc_pioneer.inc pioneer.incAnd if you want to use one of the existing bitmaps, such as rink.pnm.gz you'll need to copy that as well. Once you have your world file and all of these files in the same directory you can test out your world by doing:
pyro -s Stage -w yourworld
pyro -s simulator -w world -r robot -b brainFor example:
pyro -s Stage -w tutorial.world -r Player2 -b BBWander.pyTo save computation time during the learning process you can avoid bringing up the GUI by adding the following at the end of the command line:
-g tty
netstat --inet --tcp -lpIf the socket is open and there is NOT an active pyro process running, then the socket can be closed by root by doing kill -9 PID where PID is the process id of that socket.
You will turn in a 4-6 page paper describing your project. Your paper should include the following:
Your grade will not be based on whether your experiment succeeds or fails. Negative results are as important as positive results. Your grade will be based solely on the thoroughness and readability of your paper.