CS81 Adaptive Robotics¶

Lab 2a: Simple Logic Neural Networks¶

Due next Monday before noon¶

Introduction¶

In this notebook you will learn how to use conx to create, train, and test simple neural networks. The conx library is built on top of keras, which is in turn built on top of tensorflow.

Defining a Network¶

Here are the steps you need to go through to create a neural network in conx.

Import the conx library.
Construct a Network object.
Add layers to the Network, starting with the input and ending with the output. For any non-input layer, specify the size of each layer as well as the activation function to use.
Connect the layers.
Compile the network, giving it a loss and optimizer. Below we use the Stochastic Gradient Descent (SGD) optimizer with a particular learning rate (lr) and momentum.
Verify that the Network has the correct structure by creating a summary of the model.

For our first conx network we will be learning logical OR, which will take 2 input values reprenting True/False and outputs True/False.

from conx import *

Using TensorFlow backend.

or_net = Network("or")
or_net.add(Layer("input", 2))
or_net.add(Layer("output", 1, activation = "sigmoid"))
or_net.connect()
or_net.compile(loss="mse", optimizer=SGD(lr=0.1, momentum=0.9))
or_net.model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input (InputLayer)           (None, 2)                 0         
_________________________________________________________________
output (Dense)               (None, 1)                 3         
=================================================================
Total params: 3
Trainable params: 3
Non-trainable params: 0
_________________________________________________________________

The number of parameters in a network is based on the size of the layers. Each layer automatically has an additional node, called the bias, which is fully connected to the next layer. So to determine the number of paramters between any two layers, the formula is: (size(layer1) + 1) * size(layer2). So for this simple network, we have (size(input) + 1) * size(output) = (2+1) * 1 = 3.

Defining a Dataset¶

In order to train a neural network you will need to define a dataset. For small datasets, you can manually add the pairs of input patterns and target patterns one at a time. We will be using two inputs (using 1 to represent True and 0 to represent False) and outputting the OR of the two inputs, where OR(0,0)=0, OR(0,1)=1, OR(1,0)=1, and OR(1,1)=1. Using the clear method ensures that you won't add duplicates of the patterns as you are testing within a notebook.

or_net.dataset.clear()
or_net.dataset.add([0,0], [0])
or_net.dataset.add([0,1], [1])
or_net.dataset.add([1,0], [1])
or_net.dataset.add([1,1], [1])
or_net.dataset.summary()

Input Summary:
   count  : 4 (4 for training, 0 for testing)
   shape  : (2,)
   range  : (0.0, 1.0)
Target Summary:
   count  : 4 (4 for training, 0 for testing)
   shape  : (1,)
   range  : (0.0, 1.0)

Typically, you would divide the dataset into two subsets, one for training and another for validation. After each pass through the training set, the weights of the network are fixed and the validation set is tested to provide feedback on how well the network is learning. This works best when the validation set is distinct from the training set. For small datasets, like the case with the OR problem, we don't have enough examples to create two separate datasets. If no validation set is provided, then the training set is also used as the validation set.

Training a Network¶

Use the reset method, prior to training, to ensure that the network's parameters are re-initialized to random values. Next the train method is called. The train method takes the following paramters.

epochs: One pass through the entire dataset is called an epoch of training. This indicates the maximum number of epochs of training.
accuracy: If the network achieves this percentage correct on the validation set, then training stops early.
tolerance: Defines how close an output has to be to the target to be considered correct. A tolerance of 0.2 indicates that if the target is 1 and the output is greater than 0.8, it will be correct. Similary if the target is 0 and the output is less than 0.2, it will be correct.
report_rate: Specifies how often (in terms of epochs) the status of network training should be displayed.

This network should train successfully in about 200 epochs. Notice that training error decreases and accuracy increases over the course of training.

or_net.reset()
or_net.train(epochs=500, accuracy=0.9, tolerance=0.2, report_rate=10)

Training...
Epoch #   10 | train error 0.15201 | train accuracy 0.75000 | validate% 0.50000
Epoch #   20 | train error 0.11627 | train accuracy 0.75000 | validate% 0.50000
Epoch #   30 | train error 0.09743 | train accuracy 0.75000 | validate% 0.50000
Epoch #   40 | train error 0.07933 | train accuracy 1.00000 | validate% 0.50000
Epoch #   50 | train error 0.06557 | train accuracy 1.00000 | validate% 0.50000
Epoch #   60 | train error 0.05608 | train accuracy 1.00000 | validate% 0.50000
Epoch #   70 | train error 0.04883 | train accuracy 1.00000 | validate% 0.50000
Epoch #   80 | train error 0.04305 | train accuracy 1.00000 | validate% 0.75000
Epoch #   90 | train error 0.03842 | train accuracy 1.00000 | validate% 0.75000
Epoch #  100 | train error 0.03463 | train accuracy 1.00000 | validate% 0.75000
Epoch #  110 | train error 0.03147 | train accuracy 1.00000 | validate% 0.75000
Epoch #  120 | train error 0.02880 | train accuracy 1.00000 | validate% 0.75000
Epoch #  130 | train error 0.02651 | train accuracy 1.00000 | validate% 0.75000
Epoch #  140 | train error 0.02453 | train accuracy 1.00000 | validate% 0.75000
Epoch #  150 | train error 0.02280 | train accuracy 1.00000 | validate% 0.75000
Epoch #  160 | train error 0.02128 | train accuracy 1.00000 | validate% 0.75000
Epoch #  170 | train error 0.01994 | train accuracy 1.00000 | validate% 0.75000
Epoch #  180 | train error 0.01875 | train accuracy 1.00000 | validate% 0.75000
Epoch #  190 | train error 0.01768 | train accuracy 1.00000 | validate% 0.75000
========================================================================
Epoch #  195 | train error 0.01718 | train accuracy 1.00000 | validate% 1.00000

Testing a Network¶

Just as with training, when you test a network you can specify a tolerance level. You should use the same tolerance level when you test as you did when your trained.

or_net.test(tolerance=0.2)

Testing on training dataset...
# | inputs | targets | outputs | result
---------------------------------------
0 | [0.00, 0.00] | [0.00] | [0.20] | correct
1 | [0.00, 1.00] | [1.00] | [0.88] | correct
2 | [1.00, 0.00] | [1.00] | [0.88] | correct
3 | [1.00, 1.00] | [1.00] | [1.00] | correct
Total count: 4
Total percentage correct: 1.0

Inspecting the Weights of a Network¶

Because the OR network is so small, we can easily look at all of its weights. You can view the weights of network one layer at a time. The following command displays the weights coming into the output layer.

or_net.get_weights("output")

[[[3.3759121894836426], [3.399343490600586]], [-1.3885445594787598]]

The last value is the weight from the bias node. Remember that the bias node has a constant input value of 1.0. Do these weights make sense to you? How do they work to correctly implement the OR function in the network?

Inspecting the Structure of a Network¶

For very large networks, it is useful to see a visual depiction of the network, showing all of the layers and how they are connected. To do this in conx, simply evaluate the Network object's name. You can also watch the network propagate patterns one by one through this visualization. Colors are used to represent the activation level of each node. The blacker the color, the closer the activation is to 1.0. The redder the color, the closer the activation is to -1.0. The whiter the color, the closer the color is to 0.0.

or_net

from time import sleep
for pattern in or_net.dataset.inputs:
    or_net.propagate(pattern)
    sleep(1.0)

Saving and Re-Loading a Network¶

When training very large networks, it is useful to save the final state of the trained weights, and re-load them, rather then re-training from scratch each time.

print("Trained weights", or_net.get_weights("output"))
or_net.save("and_net") # save the network
or_net.reset() # reset the network with new random weights
print("Random weights", or_net.get_weights("output"))
or_net.load("and_net") # load the saved network
print("Restored weights", or_net.get_weights("output"))

Trained weights [[[3.3759121894836426], [3.399343490600586]], [-1.3885445594787598]]
Random weights [[[-0.9584250450134277], [0.6063865423202515]], [0.0]]
Restored weights [[[3.3759121894836426], [3.399343490600586]], [-1.3885445594787598]]

Create, Train, and Test an AND network¶

Now that you've seen how to use conx, create your own network to solve the logical AND problem, where AND(0,0)=0, AND(0,1)=0, AND(1,0)=0, and AND(1,1)=1. Be sure to inspect the weights after training, and explain how the network has solved the problem.

# Create the and_net

# Create the and_net dataset

# Train the and_net

# Test the and_net

# Inspect the and_net weights and explain them

Solving XOR¶

The XOR logic problem is harder to solve than the previous two problems. Recall that XOR(0,0)=0, XOR(0,1)=1, XOR(1,0)=1, and XOR(1,1)=0. It cannot be solved without adding another layer of nodes into the network. Any layer of nodes between the input and output layers is typically called a hidden layer. If you need multiple hidden layers, you must give them unique names.

xor_net = Network("xor")
xor_net.add(Layer("input", 2))
xor_net.add(Layer("hidden", 2, activation="sigmoid"))
xor_net.add(Layer("output", 1, activation="sigmoid"))
xor_net.connect()
xor_net.compile(loss="mse", optimizer=SGD(lr=0.1, momentum=0.9))
xor_net.model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input (InputLayer)           (None, 2)                 0         
_________________________________________________________________
hidden (Dense)               (None, 2)                 6         
_________________________________________________________________
output (Dense)               (None, 1)                 3         
=================================================================
Total params: 9
Trainable params: 9
Non-trainable params: 0
_________________________________________________________________

Explain why there are 9 parameters in this XOR network.

Another way to Create a Dataset¶

Rather than adding the training patterns one at a time as we did previously, you can instead create a list of all of the pairs of input, target patterns as shown below.

dataset = [
    ([0, 0], [0]),
    ([0, 1], [1]),
    ([1, 0], [1]),
    ([1, 1], [0])
]
xor_net.set_dataset(dataset)
xor_net.dataset.summary()

Input Summary:
   count  : 4 (4 for training, 0 for testing)
   shape  : (2,)
   range  : (0.0, 1.0)
Target Summary:
   count  : 4 (4 for training, 0 for testing)
   shape  : (1,)
   range  : (0.0, 1.0)

xor_net.reset()
xor_net.train(epochs=2000, accuracy=0.9, tolerance=0.2, report_rate=50)

Training...
Epoch #   50 | train error 0.25136 | train accuracy 0.50000 | validate% 0.00000
Epoch #  100 | train error 0.25038 | train accuracy 0.75000 | validate% 0.00000
Epoch #  150 | train error 0.24983 | train accuracy 0.75000 | validate% 0.00000
Epoch #  200 | train error 0.24935 | train accuracy 0.50000 | validate% 0.00000
Epoch #  250 | train error 0.24873 | train accuracy 0.50000 | validate% 0.00000
Epoch #  300 | train error 0.24773 | train accuracy 0.50000 | validate% 0.00000
Epoch #  350 | train error 0.24598 | train accuracy 0.50000 | validate% 0.00000
Epoch #  400 | train error 0.24277 | train accuracy 0.75000 | validate% 0.00000
Epoch #  450 | train error 0.23691 | train accuracy 0.75000 | validate% 0.00000
Epoch #  500 | train error 0.22688 | train accuracy 0.75000 | validate% 0.00000
Epoch #  550 | train error 0.21165 | train accuracy 0.75000 | validate% 0.00000
Epoch #  600 | train error 0.19086 | train accuracy 0.75000 | validate% 0.00000
Epoch #  650 | train error 0.16213 | train accuracy 0.75000 | validate% 0.00000
Epoch #  700 | train error 0.12243 | train accuracy 1.00000 | validate% 0.00000
Epoch #  750 | train error 0.08056 | train accuracy 1.00000 | validate% 0.00000
Epoch #  800 | train error 0.05098 | train accuracy 1.00000 | validate% 0.50000
Epoch #  850 | train error 0.03411 | train accuracy 1.00000 | validate% 0.75000
========================================================================
Epoch #  888 | train error 0.02641 | train accuracy 1.00000 | validate% 1.00000

xor_net.test(tolerance=0.2)

Testing on training dataset...
# | inputs | targets | outputs | result
---------------------------------------
0 | [0.00, 0.00] | [0.00] | [0.16] | correct
1 | [0.00, 1.00] | [1.00] | [0.80] | correct
2 | [1.00, 0.00] | [1.00] | [0.87] | correct
3 | [1.00, 1.00] | [0.00] | [0.14] | correct
Total count: 4
Total percentage correct: 1.0

XOR without a hidden layer¶

Go back up to the cell where xor_net was defined, and comment out the line that adds the hidden layer. Re-train and re-test the network. How does it do?

Be sure to reinistate this line after you are done.

Inspecting the XOR weights¶

To inspect all of the XOR weights you need to look at the weights coming into the hidden layer, as well as the weights coming into the output layer. Try to make sense of how the network is solving the XOR problem.

xor_net.get_weights("hidden")

[[[3.7316300868988037, 4.450456142425537],
  [-3.482870101928711, -4.566656589508057]],
 [1.6577613353729248, -2.6934008598327637]]

xor_net.get_weights("output")

[[[-4.773838996887207], [5.3587117195129395]], [2.04630970954895]]

Use git to add, commit, and push¶

Be sure to save this notebook before moving on.