In this notebook you will learn how to use conx to create, train, and test simple neural networks. The conx library is built on top of keras, which is in turn built on top of tensorflow.
Here are the steps you need to go through to create a neural network in conx.
For our first conx network we will be learning logical OR, which will take 2 input values reprenting True/False and outputs True/False.
from conx import *
or_net = Network("or")
or_net.add(Layer("input", 2))
or_net.add(Layer("output", 1, activation = "sigmoid"))
or_net.connect()
or_net.compile(loss="mse", optimizer=SGD(lr=0.1, momentum=0.9))
or_net.model.summary()
The number of parameters in a network is based on the size of the layers. Each layer automatically has an additional node, called the bias, which is fully connected to the next layer. So to determine the number of paramters between any two layers, the formula is: (size(layer1) + 1) * size(layer2). So for this simple network, we have (size(input) + 1) * size(output) = (2+1) * 1 = 3.
In order to train a neural network you will need to define a dataset. For small datasets, you can manually add the pairs of input patterns and target patterns one at a time. We will be using two inputs (using 1 to represent True and 0 to represent False) and outputting the OR of the two inputs, where OR(0,0)=0, OR(0,1)=1, OR(1,0)=1, and OR(1,1)=1. Using the clear method ensures that you won't add duplicates of the patterns as you are testing within a notebook.
or_net.dataset.clear()
or_net.dataset.add([0,0], [0])
or_net.dataset.add([0,1], [1])
or_net.dataset.add([1,0], [1])
or_net.dataset.add([1,1], [1])
or_net.dataset.summary()
Typically, you would divide the dataset into two subsets, one for training and another for validation. After each pass through the training set, the weights of the network are fixed and the validation set is tested to provide feedback on how well the network is learning. This works best when the validation set is distinct from the training set. For small datasets, like the case with the OR problem, we don't have enough examples to create two separate datasets. If no validation set is provided, then the training set is also used as the validation set.
Use the reset method, prior to training, to ensure that the network's parameters are re-initialized to random values. Next the train method is called. The train method takes the following paramters.
This network should train successfully in about 200 epochs. Notice that training error decreases and accuracy increases over the course of training.
or_net.reset()
or_net.train(epochs=500, accuracy=0.9, tolerance=0.2, report_rate=10)
Just as with training, when you test a network you can specify a tolerance level. You should use the same tolerance level when you test as you did when your trained.
or_net.test(tolerance=0.2)
Because the OR network is so small, we can easily look at all of its weights. You can view the weights of network one layer at a time. The following command displays the weights coming into the output layer.
or_net.get_weights("output")
The last value is the weight from the bias node. Remember that the bias node has a constant input value of 1.0. Do these weights make sense to you? How do they work to correctly implement the OR function in the network?
For very large networks, it is useful to see a visual depiction of the network, showing all of the layers and how they are connected. To do this in conx, simply evaluate the Network object's name. You can also watch the network propagate patterns one by one through this visualization. Colors are used to represent the activation level of each node. The blacker the color, the closer the activation is to 1.0. The redder the color, the closer the activation is to -1.0. The whiter the color, the closer the color is to 0.0.
or_net
from time import sleep
for pattern in or_net.dataset.inputs:
or_net.propagate(pattern)
sleep(1.0)
When training very large networks, it is useful to save the final state of the trained weights, and re-load them, rather then re-training from scratch each time.
print("Trained weights", or_net.get_weights("output"))
or_net.save("and_net") # save the network
or_net.reset() # reset the network with new random weights
print("Random weights", or_net.get_weights("output"))
or_net.load("and_net") # load the saved network
print("Restored weights", or_net.get_weights("output"))
Now that you've seen how to use conx, create your own network to solve the logical AND problem, where AND(0,0)=0, AND(0,1)=0, AND(1,0)=0, and AND(1,1)=1. Be sure to inspect the weights after training, and explain how the network has solved the problem.
# Create the and_net
# Create the and_net dataset
# Train the and_net
# Test the and_net
# Inspect the and_net weights and explain them
The XOR logic problem is harder to solve than the previous two problems. Recall that XOR(0,0)=0, XOR(0,1)=1, XOR(1,0)=1, and XOR(1,1)=0. It cannot be solved without adding another layer of nodes into the network. Any layer of nodes between the input and output layers is typically called a hidden layer. If you need multiple hidden layers, you must give them unique names.
xor_net = Network("xor")
xor_net.add(Layer("input", 2))
xor_net.add(Layer("hidden", 2, activation="sigmoid"))
xor_net.add(Layer("output", 1, activation="sigmoid"))
xor_net.connect()
xor_net.compile(loss="mse", optimizer=SGD(lr=0.1, momentum=0.9))
xor_net.model.summary()
Explain why there are 9 parameters in this XOR network.
Rather than adding the training patterns one at a time as we did previously, you can instead create a list of all of the pairs of input, target patterns as shown below.
dataset = [
([0, 0], [0]),
([0, 1], [1]),
([1, 0], [1]),
([1, 1], [0])
]
xor_net.set_dataset(dataset)
xor_net.dataset.summary()
xor_net.reset()
xor_net.train(epochs=2000, accuracy=0.9, tolerance=0.2, report_rate=50)
xor_net.test(tolerance=0.2)
Go back up to the cell where xor_net was defined, and comment out the line that adds the hidden layer. Re-train and re-test the network. How does it do?
Be sure to reinistate this line after you are done.
To inspect all of the XOR weights you need to look at the weights coming into the hidden layer, as well as the weights coming into the output layer. Try to make sense of how the network is solving the XOR problem.
xor_net.get_weights("hidden")
xor_net.get_weights("output")
Be sure to save this notebook before moving on.