CS81: Lab 4

In this lab you will implement novelty search, introduced by Joel Lehman and Kenneth O. Stanley, who argue that objective fitness functions can be deceptive, leading an evolutionary search to local maxima rather than towards the desired goal. In contrast, novelty search ignores the objective and simply searches for novel behavior, and therefore cannot be deceived.

In this lab we will focus on novelty search in isolation, without using it as part of an evolutionary search. We will explore how it operates on randomly generated behaviors. In the next lab we will apply novelty search to a robot control problem.

Git started

Go through the following steps to setup your directory for this lab.

First you need to run setup81 to create a git repository for the lab.
If you want to work alone do:
```
setup81 labs/04 none
```
If you want to work with a partner, then one of you needs to run the following while the other one waits until it finishes.
```
setup81 labs/04 partnerUsername
```
Once the script finishes, the other partner should run it on their account.
For the next step only one partner should copy over the starting point code.
```
cd ~/cs81/labs/04
cp -r ~meeden/public/cs81/labs/04/* ./
```
Whether you are working alone or with a partner, you should now add all of the files to your git repo, commit, and push them as shown below.
```
git add *
git commit -m "lab4 start"
git push
```
If you are working with a partner, your partner can now pull the changes in.
```
cd ~/cs81/labs/04
git pull
```

Implementing novelty search

Before trying to implement novelty search, review the details of the algorithm starting at the bottom of page 12 in the paper Abandoning objectives: Evolution through the search for novelty alone that we discussed this week.

Recall that novelty search maintains an archive of behaviors, comparing current behaviors against the saved ones to determine whether a current behavior is novel enough to add to the archive. A current behavior's novelty is defined as the average distance to its k nearest neighbors in the archive.

We will be representing a behavior as a list of points, where the points may be of some arbitrary dimension, and the list may be of some arbitrary length. For example, consider the coverage task that we worked on in the previous lab. One way to represent the behavior of an agent in this task is to record the new grid locations it visits in order. The length of this behavior space would be the total number of grid locations, which for a 15x15 grid is 225. The length of each point would be 2, representing the x, y location of the grid square.

In the file noveltySearch.py, implement a python class called NoveltySearch with the following methods:

__init__(self, k, limit, threshold, pointLength, behaviorLength)
Create class variables to store the archive as an initially empty list, the k (an integer) to be used in calculating nearest neighbors, the limit (an integer) for the maximum size of the archive, the threshold (a float) for how novel an example has to be before it will be added the archive, the pointLength (an integer) for the size of the points in the behavior description, and behaviorLength (an integer) for the number of points within a behavior description.
distance(self, p1, p2)
Returns the distance between points p1 and p2 which are assumed to be lists or tuples of equal length. To calculate distance, sum up the squared differences between each component of the points. Return the square root of this sum. This method should work for points of any length.
distFromkNearest(self, behavior)
Returns the sum of the distances of the given behavior (represented as a list of points) with its k-nearest neighbors in the archive. Uses the distance method as a helper.
sparseness(self, behavior)
Returns the sparseness of the given behavior. Equation 1 on page 13 of the paper defines sparseness based on individual points. We want to generalize this definition to behaviors (sequences of points). We will use the sum of component-wise distances as the sparseness of a behavior. This method should use the method distFromkNearest as a helper in calculating this value. If the archive is empty, then return the sys.maxint as the sparseness. If the archive has less than k behaviors, then use the archive's current size to compute the average distance.
checkArchive(self, behavior, otherInfo=None)
This method checks whether the given behavior is novel enough to add to the archive and returns its sparseness. The parameter otherInfo can be used to provide additional information that will be stored with the behavior in the archive. This will be helpful when doing analyses of the evolutionary run, once it completes. The archive is represented as a list of tuples, as shown below, where the behavior stored at index 0 is the oldest, and the behavior at the end of the archive is the newest.
```
[ ( behavior0, novelty0, otherInfo0 ),
  ( behavior1, novelty1, otherInfo1 ),
...
]
```
When the size of the archive is less than k, then always add the behavior. Otherwise check if the sparseness of the given behavior is greater than the threshold. If so, add this behavior to the archive. After adding a behavior, if the archive size is now greater than the limit, you should also remove the oldest behavior from the archive.

Each method of your class should include a triple-quoted comment indented underneath the def line which provides details about its parameters, return value (if applicable), and its purpose.

You've implemented the basic methods needed to do novelty search. Use git to add, commit, and push your changes.

Unit testing individual methods

In the function unitTests write code that allows you to individually test each of the methods in the following order:

distance
distFromkNearest
sparseness
checkArchive

Create test cases for which you determine the correct result and then confirm that your code responds appropriately. Be sure to test different lengths of behaviors and different lengths of points.

You will be evaluated on the thoroughness of your unit tests.

Use git to add, commit, and push your changes.

Adding analysis tools to novelty search

Analyzing the outcome of a machine learning process can be tricky because it may incorporate randomness that leads to different results for each run. It is often the case that you will need to implement additional functionality to assist you in analyzing the results. In this case we are going to add a method to save the archive, allowing us to plot the novel behaviors found, as well as a method to save a history of the archive's growth over time, allowing us to determine good settings for the novelty threshold.

saveArchive(self, filename)
This method saves the archived behaviors to the given filename with an added .archive extension. Remember that each behavior is a list of points. For example, suppose we have behaviors that consist of 3 points, where each point is 2 dimensions. We will write the output to the file in the following format. Note that the double quote at the start of each behavior name is needed.
```
"Behavior0
0.377286 0.699257 
0.948053 0.617662 
0.550392 0.265316 

"Behavior1
0.398257 0.138859 
0.529999 0.413047 
0.031389 0.538371 
...
```
Given a file in this format called test.archive we could plot it like this:
```
$ xgraph -tk -P test.archive
```
saveGrowth(self, filename)
This method saves a record of when additions were made to the archive using the given filename with an added .growth extension.
In order to accomplish this we'll need to add a few more class variables. In the constructor, initialize self.steps to 0 and self.growth to the empty list.
In the checkArchive method, increment self.steps every time it is called. When an addition is made to the archive, append a list to self.growth containing the current step and the current size of the archive.
In the saveGrowth method use the self.growth list to write out a file in the following format:
```
1 1
2 2
3 3
4 4
5 5
124 6
489 7
```
After a run completes, you can plot the growth using the following command:
```
$ xgraph -P test.growth
```

Add more unit tests to your main program for these two new methods.

Use git to add, commit, and push your changes.

Experiment with novelty search parameters

In order to gain more intuition about how novelty search operates, you will explore how various parameter settings affect the contents of the archive. Let's focus on the threshold parameter since it is crucial in determining when a behavior is novel enough to be added to the archive.

In your main program, instantiate a novelty search with a k of 5, a limit of 30, a pointLength of 2, and a behaviorLength of 3. Now we will try to determine a good threshold value. Let's assume that our point values will be random numbers in the range [0, 1].

Given these parameter settings what is the maximum sparseness value possible? Figure this out and write it as a comment in your main program.

Based on this maximum value, pick a threshold in the range [0, max]. Loop 1000 times, generating a random behavior and calling checkArchive on that behavior. After the loop, save the archive and the growth. Run the program and check the results. Smaller thresholds should result in more additions, with archive hitting its limit in size more quickly. For larger thresholds the archive may never reach its size limit. What threshold value seems to strike the best balance of using the capacity of the archive at a reasonable pace? Again, write your answer as a comment.

Submit

When you are completely done, be sure that you have pushed all of your changes. Use the git status command to verify this. If there are uncommitted changes, be sure to do another add, commit, and push cycle.

CS81 Lab 4: Novelty Search