Building a Neural Network Completely From Scratch: Python

In this article, we are going to build an entire Neural Network from scratch only using the NumPy library to classify the fashion MNIST dataset.

Introduction

Neural Networks are exciting tools for building awesome AIs and they are fun to learn and implement, while there are tons of libraries available to create and train Neural Networks in a nutshell, building your own Neural Network from scratch will provide you with a very deeper understanding of the underlying process and inner workings of Neural Networks. In this article, we are going to build an entire Neural Network from scratch only using the NumPy library to classify not the classical handwritten digits, but the fashion MNIST dataset. Alright without further delay, let's get straight into it.

The Fashion MNIST dataset

First, let's discuss a little about the dataset we are using. The Fashion MNIST dataset is popular in computer vision and machine learning and consists of 70,000 grayscale images of clothing and accessories, divided into 10 different classes. Each image is a 28x28 pixel square, which is really small compared to other large image datasets, but it is quite a challenging task for machine learning models to correctly classify the images due to the variations in the images that differentiate them from one another. 
Image Source: TensorFlow

Here each image is a 28 x 28-pixel range together forms 784 total pixels which make the input layer of our Neural Network. The dataset is already converted to train and test sets which are available in the following link:

The Architecture

Now let's see how we are going to build our Neural Network. Here is our plan,


Here, with each image in the Fashion-MNIST dataset containing 28x28 pixels, the input layer of our neural network must consist of 784 neurons. For the hidden layer, I have chosen 128 neurons, which is more than enough for detecting patterns within the images. Since there are ten distinct classes of clothing and accessories represented in the dataset, our output layer must contain ten neurons in order to classify each image. Altogether, the Neural Network we are going to build has three layers.

Data pre-processing

Proper pre-processing of data is essential before training the neural network. If you are using Jupyter Notebook, you can easily copy the dataset into the working directory. However, if you are using Google Colab instead, you must first upload the dataset to the Colab notebook and obtain the file path before proceeding with pre-processing and training. After the dataset is ready, you can load it to the notebook using the following code:

import numpy as np


with open('fashion_mnist_train.npy', 'rb') as train_data:
    X_train = np.load(train_data)
    y_train = np.load(train_data)
   
with open('fashion_mnist_test.npy', 'rb') as test_data:
    X_test = np.load(test_data)
    y_test = np.load(test_data)

Note that here I used NumPy to encode the training and testing data, So you can see it as a NumPy file format. Now, it's really simple to load the data and convert it to training and testing data with its corresponding labels.

print(X_train.shape)
print(X_test.shape)

-----

(60000, 28, 28)
(10000, 28, 28)

On the training set, we have 60,000 samples and on the testing set, we have 10,000 samples, each arranged in a 28 by 28 shape.

Now, let's slice the dataset,

X_train = X_train[:5000]
y_train = X_test[:5000]

X_test = X_test[:5000]
y_test = y_test[:5000]

Slicing the dataset makes sense here because we are just building a Neural Network only to understand it better not for production or something. So here I'm choosing 5000 samples each from training and testing data.

Reshaping the dataset

After slicing the dataset, you can see that the shape of the dataset is (5000, 28, 28) and (5000, 28, 28). The dataset is in a 2-dimensional matrix format of 28x28 and not like a one-dimensional flattened matrix of the shape of 784. So to feed the data into the input layer containing 784 neurons, we need to reshape the dataset into (5000, 784) for the training set and (5000, 784) for the testing set. Here is how you can do it.

X_train = X_train.reshape(X_train.shape[0], -1) / 255.0
X_test = X_test.reshape(X_test.shape[0], -1) / 255.0

One more thing you can note here is that we divided the entire train and test set by 255. It is used to normalize the pixel values of the image data to be within the range of 0 to 1. This is a common preprocessing step for image data because it helps to improve the numerical stability of the model by reducing the variance and computation.

print(X_train.shape)
print(X_test.shape)

------

(5000, 784)
(5000, 784)

Vectorizing(one-hot encoding) the labels

One more step remaining in the pre-processing is to vectorize the labels or the target values. This is necessary because most machine learning algorithms are designed to work with numerical data, and cannot handle categorical data directly. Well, it is really simple to do with the Keras library.

from keras.utils import to_categorical

y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

What really happens is that the labels that contain numbers from 0 - 9 representing each class of clothes and accessories in the fashion MNIST dataset are converted into their corresponding one-hot vector of length 10, for instance, if the value is 3 the one-hot vector will be [0,0,0,1,0,0,0,0,0,0].

Creating the Neural Network Class

Now let's get into the interesting part, If you are not familiar with or need a recap on the working of Neural Networks, backpropagation, and Gradient Descent, we have a whole article about it. I recommend you read it.

Alright, first let's initialize some instance variables that are necessary like the layers, learning rate, and epochs,

class NN:
    def __init__(self, input_neurons, hidden_neurons, output_neurons, learning_rate, epochs):
       
        # initializing the instance variables
        self.input_neurons = input_neurons
        self.hidden_neurons = hidden_neurons
        self.output_neurons = output_neurons
        self.epochs = epochs
       
        # Links of weights from input layer to hidden layer
        self.wih = np.random.normal(0.0, pow(self.input_neurons, -0.5), (self.hidden_neurons, self.input_neurons))
        self.bih = 0
       
        # Links of weights from hidden layer to output layer
        self.who = np.random.normal(0.0, pow(self.hidden_neurons, -0.5), (self.output_neurons, self.hidden_neurons))
        self.bho = 0

        self.lr = learning_rate # Learning rate

Simple! The thing to note here is the weights and bias, we initialized some random weights using the NumPy library corresponding to each connection between the input layer and hidden layer (wih) and the hidden layer to the output layer(who), and the biases as well.

Activation Function

A really important concept when comes to Neural Networks is the activation function. An activation function is used to bring non-linearity to our Neural Network. Well, every complex data like the one we are dealing with contains non-linear relationships. Without an activation function, the network would simply be a linear model and would not be able to capture complex, nonlinear relationships in the data.

Here I'm using the Sigmoid(Logistic) activation function. You can choose any activation function, but I found the sigmoid activation function more suitable here since it gives us the probabilities of each class. 

\[\sigma(x) = \frac{1}{1 +e^{-x}}\]

We also need the derivative of the sigmoid activation function. We have done this entire derivation in the backpropagation and Gradient Descent Article, go there if you want to take a look at the derivation.

\[\sigma(x).(1-\sigma(x))\]

So here is how you can implement the following equation in code,

    def activation(self, z):
        """Returns the sigmoid of z"""
        z = np.clip(z, -500, 500) # Avoid overflow error
        return 1 / (1 + np.exp(-z))

    def sigmoid_derivative(self, z):
        """Returns the derivative of the sigmoid of z"""
        return self.activation(z) * (1 - self.activation(z))

Forward Propagation

Forward propagation as we know is the process of passing the inputs through the network to produce an output. It involves multiplying the input values by the network's weights, adding a bias term, and applying an activation function to produce an output for each neuron in the network. Here is the implementation.

    # Forward propagation
    def forward(self, input_list):
        inputs = np.array(input_list, ndmin=2).T
       
        # Passing inputs to the hidden layer
        hidden_inputs = np.dot(self.wih, inputs) + self.bih

        # Getting outputs from the hidden layer
        hidden_outputs = self.activation(hidden_inputs)

        # Passing inputs from the hidden layer to the output layer
        final_inputs = np.dot(self.who, hidden_outputs) + self.bho

        # Getting output from the output layer
        yj = self.activation(final_inputs)      
       
        return yj

That's it! We are simply passing the inputs from the input layer to the hidden layer and finally to the output layer and returning the result.

Backpropagation and Gradient Descent

Now comes the most important part, the backpropagation which is used to train our Neural Network. Backpropagation is the idea of propagating the errors made by the network backward all the way to the input layer for adjusting the weights and biases that fit the training data. Again I recommend reading the article on backpropagation if you're not familiar.

    # Back propagation
    def backprop(self, inputs_list, targets_list):
       
        inputs = np.array(inputs_list, ndmin=2).T
           
        tj = np.array(targets_list, ndmin=2).T # Targets
        # passing inputs to the hidden layer
        hidden_inputs = np.dot(self.wih, inputs) + self.bih

        # Getting outputs from the hidden layer
        hidden_outputs = self.activation(hidden_inputs)
       
        # Passing inputs from the hidden layer to the output layer
        final_inputs = np.dot(self.who, hidden_outputs) + self.bho
       
        # Getting output from the output layer
        yj = self.activation(final_inputs)
       
        # Finding the errors from the output layer
        output_errors = -(tj - yj)
       
        # Finding the error in the hidden layer
        hidden_errors = np.dot(self.who.T, output_errors)

        # Updating the weights using Gradient Descent Update Rule
        self.who -= self.lr * np.dot((output_errors * self.sigmoid_derivative(yj)), np.transpose(hidden_outputs))
        self.wih -= self.lr * np.dot((hidden_errors * self.sigmoid_derivative(hidden_outputs)), np.transpose(inputs))


        #updating bias
        self.bho -= self.lr * (output_errors * self.sigmoid_derivative(yj))
        self.bih -= self.lr * (hidden_errors * self.sigmoid_derivative(hidden_outputs))
        pass

The backprop method has two parameters, one is the input list which is the training data and the other is the target list which is the corresponding labels.

The fit method

This method is used to train the network over a range of epochs(iterations). So in this method, we perform the backpropagation and gradient descent we have defined above.

    # Performing Gradient Descent Optimization using Backpropagation
    def fit(self, inputs_list, targets_list):
        for epoch in range(self.epochs):        
            self.backprop(inputs_list, targets_list)
            print(f"Epoch {epoch}/{self.epochs} completed.")

The predict method

The final method of our Neural Network class is the predict method, which is of course used to perform the prediction using the updated weights and biases.

    def predict(self, X):
        outputs = self.forward(X).T
        return outputs

The predict method simply takes the test data as an argument and performs the forward propagation to produce the result.

That's it! we have coded and entire the Neural Network class from scratch.

Training and Testing the Network

Alright! Now, let's put everything into action, to train our Neural Network, we need to create the object of the Network class and need to call the fit method.

nn = NN(input_neurons=784, hidden_neurons=128, output_neurons=10, learning_rate=0.01, epochs=1000)
nn.fit(X_train, y_train)

Testing the Network

# Predicting probabiliies
probs = nn.predict(X_test)


# Converting probabilities to one-hot vector format
predictions = []

for prob in probs:
    max_idx = np.argmax(prob)
    prediction = np.zeros_like(prob)
    prediction[max_idx] = 1    
    predictions.append(prediction)

Now let's evaluate the network performance in the testing data.

from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report


print("Accuracy:",accuracy_score(predictions, y_test))
print("CR:", classification_report(predictions, y_test))

---------

Accuracy: 0.7644

CR:               precision    recall  f1-score   support

           0       0.76      0.78      0.77       498
           1       0.95      0.96      0.96       479
           2       0.71      0.60      0.65       616
           3       0.84      0.73      0.78       576
           4       0.71      0.60      0.65       613
           5       0.81      0.79      0.80       499
           6       0.26      0.65      0.37       195
           7       0.87      0.79      0.83       554
           8       0.91      0.89      0.90       536
           9       0.81      0.89      0.85       434

   micro avg       0.76      0.76      0.76      5000
   macro avg       0.76      0.77      0.76      5000
weighted avg       0.79      0.76      0.77      5000
 samples avg       0.76      0.76      0.76      5000

Wow, the accuracy is good, although it is not impressive. But since this is a self-made network, a 76% accuracy is great actually.

Plotting some images with the corresponding predictions

Let's plot some images with the corresponding predictions the network made for us, to understand. Here is how you can do it.

import matplotlib.pyplot as plt

fig, axes = plt.subplots(2, 4, figsize=(10, 6))
for i, ax in enumerate(axes.flat):
    img_data = X_test[i].reshape((28, 28))
    # Display image
    ax.imshow(img_data, cmap='gray')
    ax.set_xticks([])
    ax.set_yticks([])
    index = np.where(predictions[i] == 1)[0][0]
    label = class_names[index]
    true_label = class_names[np.argmax(y_test[i])]
    if label != true_label: # Writing the prediction label as red if it is wrong
        ax.set_xlabel(label, color='r')
    else:
        ax.set_xlabel(label)
plt.show()

, and the plot will look like this,



The red-colored labels are those that the network predicted wrong.

The full version of the code

class NN:
    def __init__(self, input_neurons, hidden_neurons, output_neurons, learning_rate, epochs):
       
        # initializing the instance variables
        self.input_neurons = input_neurons
        self.hidden_neurons = hidden_neurons
        self.output_neurons = output_neurons
        self.epochs = epochs
       
        # Links of weights from input layer to hidden layer
        self.wih = np.random.normal(0.0, pow(self.input_neurons, -0.5), (self.hidden_neurons, self.input_neurons))
        self.bih = 0
       
        # Links of weights from hidden layer to output layer
        self.who = np.random.normal(0.0, pow(self.hidden_neurons, -0.5), (self.output_neurons, self.hidden_neurons))
        self.bho = 0

        self.lr = learning_rate # Learning rate
       
    def activation(self, z):
        """Returns the sigmoid of z"""
        z = np.clip(z, -500, 500) # Avoid overflow error
        return 1 / (1 + np.exp(-z))

    def sigmoid_derivative(self, z):
        """Returns the derivative of the sigmoid of z"""
        return self.activation(z) * (1 - self.activation(z))
   
    # Forward propagation
    def forward(self, input_list):
        inputs = np.array(input_list, ndmin=2).T
       
        # Passing inputs to the hidden layer
        hidden_inputs = np.dot(self.wih, inputs) + self.bih

        # Getting outputs from the hidden layer
        hidden_outputs = self.activation(hidden_inputs)

        # Passing inputs from the hidden layer to the output layer
        final_inputs = np.dot(self.who, hidden_outputs) + self.bho

        # Getting output from the output layer
        yj = self.activation(final_inputs)      
       
        return yj

   
    # Back propagation
    def backprop(self, inputs_list, targets_list):
       
        inputs = np.array(inputs_list, ndmin=2).T
           
        tj = np.array(targets_list, ndmin=2).T # Targets
        # passing inputs to the hidden layer
        hidden_inputs = np.dot(self.wih, inputs) + self.bih

        # Getting outputs from the hidden layer
        hidden_outputs = self.activation(hidden_inputs)
       
        # Passing inputs from the hidden layer to the output layer
        final_inputs = np.dot(self.who, hidden_outputs) + self.bho
       
        # Getting output from the output layer
        yj = self.activation(final_inputs)
       
        # Finding the errors from the output layer
        output_errors = -(tj - yj)
       
        # Finding the error in the hidden layer
        hidden_errors = np.dot(self.who.T, output_errors)

        # Updating the weights using Update Rule
        self.who -= self.lr * np.dot((output_errors * self.sigmoid_derivative(yj)), np.transpose(hidden_outputs))
        self.wih -= self.lr * np.dot((hidden_errors * self.sigmoid_derivative(hidden_outputs)), np.transpose(inputs))


        #updating bias
        self.bho -= self.lr * (output_errors * self.sigmoid_derivative(yj))
        self.bih -= self.lr * (hidden_errors * self.sigmoid_derivative(hidden_outputs))
        pass

    # Performing Gradient Descent Optimization using Backpropagation
    def fit(self, inputs_list, targets_list):
        for epoch in range(self.epochs):        
            self.backprop(inputs_list, targets_list)
            print(f"Epoch {epoch}/{self.epochs} completed.")
           
    def predict(self, X):
        outputs = self.forward(X).T
        return outputs

Articles to read:


Thanks for reading!

If you have any questions or queries feel free to ask in the comment box.