【转载】Neural Networks for Digit Recognition with Pybrain

Hi everyone
As a part of my B.Tech project, we were required to make a neural network, among other things, that can train on given data and perform the task of Digit Recognition. We chose python to do our project in given the wide array of libraries.

We aim to identify digits from images. The dataset is a part of MNIST database and is provided in the online course Machine Learning on Coursera. The images are of the size 20*20. This is a classification problem with 10 output classes. The pixel values will be used as features. We use PyBrain implementation of neural networks to make our network.

Let us first import all the modules required.

import os
import sys
from numpy import *
from scipy import io
import matplotlib.pyplot as plt
from pybrain.structure import *
from pybrain.datasets import SupervisedDataSet
from pybrain.utilities import percentError
from pybrain.supervised.trainers import BackpropTrainer

Then we load the data from the .mat file. Number of classes are determined by the unique values taken by Y. Since ‘0’ was denoted by class value 10, so we change it to ‘0’ first.

data = io.loadmat(‘ex4data1.mat‘)
X = data[‘X‘]
[m, n] = shape(X)
Y = data[‘y‘]
Y = reshape(Y, (len(Y), -1))
numLabels = len(unique(Y))
Y[Y == 10] = 0

Also, we add a bias term to the feature matrix.

X = hstack((ones((m, 1)), X))
n = n+1

>>> X
 array([[ 1., 0., 0., ..., 0., 0., 0.],
 [ 1., 0., 0., ..., 0., 0., 0.],
 [ 1., 0., 0., ..., 0., 0., 0.],
 ...,
 [ 1., 0., 0., ..., 0., 0., 0.],
 [ 1., 0., 0., ..., 0., 0., 0.],
 [ 1., 0., 0., ..., 0., 0., 0.]])
 >>> Y
 array([[0],
 [0],
 [0],
 ...,
 [9],
 [9],
 [9]], dtype=uint8)

>>> shape(X)
 (5000, 401)
 >>> shape(Y)
 (5000, 1)

Our implementation conatins 3 layers viz. input layer, hidden layer and output layer although adding more hidden layers is relatively easy with PyBrain. The size of hidden layer can be set as per requirement. I have seen people set it according to ‘one neuron per output class’ or ‘one neuron per input feature’. Both will do, or even any number will do but greater size of hidden layer leads to greater efficiency and more training time.

nInput = n
nHidden0 = int(n / 5)
nOutput = numLabels

PyBrain allows us to specify each layer separately to add more control. Once the layers are set, we define a FeedForward network and add the layers as modules to the network. Again, we have the liberty to chose how the layers are interconnected. After adding the interconnections, we sort the network that sets everything in place to form a neural network.

inLayer = LinearLayer(nInput)
hiddenLayer = SigmoidLayer(nHidden0)
outLayer = SoftmaxLayer(nOutput)

net = FeedForwardNetwork()
net.addInputModule(inLayer)
net.addModule(hiddenLayer)
net.addOutputModule(outLayer)

theta1 = FullConnection(inLayer, hiddenLayer)
theta2 = FullConnection(hiddenLayer, outLayer)

net.addConnection(theta1)
net.addConnection(theta2)

net.sortModules()

At this point, our network is ready to work though it is not trained. So we check it at any random input. The image corresponding to the example is shown and a value is predicted with the initialised parameters. This will give a mismatch.

c = random.randint(0, X.shape[0])
print("testing without training \nchoosing a random number " + str(c))
X1 = X[c, :]
prediction = net.activate(X1)
net.activate(X1)
p = argmax(prediction, axis=0)
plotData(X, Y, c)
print("predicted output is \t" + str(p))

>>> 
testing without training 
choosing a random number    1497
true number is [2]
predicted output is     5

The plotData function is written as

def plotData(X, Y, c):
    m, n = shape(X)
    image = array(X[c,1:n])
    plt.imshow((image.reshape(20, 20)).T, cmap=‘Greys‘)
    plt.show()
    print("true number is " + str(Y[c]))

Now, to train the network on real dataset, we need to create an object of dataset class. ALthough, this is a classification problem and we should be going with an object of ‘ClassificationDataSet’, I found some problem and instead set it up as a Supervised problem with more than one ouptut and only 2 classes(‘1’ for yes and ‘0’ for no). The output Y was changed into a Softmax layer using the function ‘convertToOneOfMany’

def convertToOneOfMany(Y):
    rows, cols = shape(Y)
    numLabels = len(unique(Y))
    Y2 = zeros((rows, numLabels))
    for i in range(0, rows):
        Y2[i, Y[i]] = 1
    return Y2

allData = SupervisedDataSet(n, numLabels)
Y2 = convertToOneOfMany(Y)

allData.setField(‘input‘, X)
allData.setField(‘target‘, Y2)

>>>Y2
 array([[ 1., 0., 0., ..., 0., 0., 0.],
 [ 1., 0., 0., ..., 0., 0., 0.],
 [ 1., 0., 0., ..., 0., 0., 0.],
 ...,
 [ 0., 0., 0., ..., 0., 0., 1.],
 [ 0., 0., 0., ..., 0., 0., 1.],
 [ 0., 0., 0., ..., 0., 0., 1.]])

Each prudent Machine learning implementation takes care of avoiding overfitting. So we too divide the dataset into training and test dataset. The following command shuffles data and divides it into training and test dataset according to the given proportion.

dataTrain, dataTest = allData.splitWithProportion(0.70)

For training, setup a ‘back propagation trainer’ with parameters as the feedforward network(network to modify while training), training dataset(dataset on which training will take place). Other parameters are optional and can be referred in the docs. We store the true values of examples to find out the accuracy. The network can be trained at once or step by step to see how accuracy rises.

train = BackpropTrainer(net, dataset=dataTrain, learningrate=0.1, momentum=0.1)
trueTrain = dataTrain[‘target‘].argmax(axis=1)
trueTest = dataTest[‘target‘].argmax(axis=1)

EPOCHS = 20
for i in range(EPOCHS):
    train.trainEpochs(1)
    outTrain = net.activateOnDataset(dataTrain)
    outTrain = outTrain.argmax(axis=1)
    resTrain = 100 - percentError(outTrain, trueTrain)

    outTest = net.activateOnDataset(dataTest)
    outTest = outTest.argmax(axis=1)
    resTest = 100 - percentError(outTest, trueTest)

    print("epoch: %4d " % train.totalepochs,"\ttrain acc: %5.2f%% " % resTrain, "\ttest acc: %5.2f%%" % resTest)

We train the network step by step and check the accuracy on trainig and test dataset after each step. EPOCHS denote the number of cycles of training. The number of cycles can be changed to see when convergence takes place. Finally, we check on the same test image as done earlier.

prediction = net.activate(X1)
print(prediction)
p = argmax(prediction, axis=0)
print("predicted output is \t" + str(p))

>>>
 epoch: 1 train acc: 82.37% test acc: 78.53%
 epoch: 2 train acc: 92.66% test acc: 87.00%
 epoch: 3 train acc: 95.11% test acc: 88.87%
 epoch: 4 train acc: 95.00% test acc: 89.67%
 epoch: 5 train acc: 95.89% test acc: 88.93%
 epoch: 6 train acc: 99.43% test acc: 90.93%
 epoch: 7 train acc: 99.63% test acc: 90.87%
 epoch: 8 train acc: 99.91% test acc: 90.47%
 epoch: 9 train acc: 99.91% test acc: 91.13%
 epoch: 10 train acc: 99.97% test acc: 90.93%
 epoch: 11 train acc: 100.00% test acc: 90.87%
 epoch: 12 train acc: 100.00% test acc: 91.27%
 epoch: 13 train acc: 100.00% test acc: 91.33%
 epoch: 14 train acc: 100.00% test acc: 91.07%
 epoch: 15 train acc: 100.00% test acc: 91.13%
 epoch: 16 train acc: 100.00% test acc: 91.13%
 epoch: 17 train acc: 100.00% test acc: 91.27%
 epoch: 18 train acc: 100.00% test acc: 91.33%
 epoch: 19 train acc: 100.00% test acc: 91.33%
 epoch: 20 train acc: 100.00% test acc: 91.20%

 [ 1.59597297e-05 2.25560490e-07 9.97647093e-01 6.55645310e-06
 7.44035909e-08 9.97930857e-07 3.73833112e-07 8.80468650e-05
 9.46062824e-04 1.29460949e-03]

 predicted output is 2

I hope this post was useful and will help you in designing your own neural network. You are free to comment for any suggestions or mistakes. The whole code is availabale athttps://github.com/poweltalwar/DeepLearning/blob/master/neuralNets.py

PS : Will leave for a trip to Rajasthan with class on 20th Jan.