The Perceptron Algorithm: From Scratch Using Python

In this article, we'll explore the basics of the perceptron algorithm and provide a step-by-step guide to implementing it in Python from scratch.

Introduction

Perceptron Algorithm is one of the earliest and most fundamental algorithms in the field of machine learning that has laid the foundation for many more complex and sophisticated algorithms that have been developed over the years. Understanding Perceptron is really important especially if you are getting started with Deep Learning and Neural Networks since it is one of the basic building blocks of Neural Networks.  In this article, we'll explore the basics of the perceptron algorithm and provide a step-by-step guide to implementing it in Python from scratch. We'll also do the classification of the Iris dataset using the Perceptron algorithm we just build from scratch. So let's get started!

What is Perceptron and How it has been developed?

The word Perceptron is first ever coined in 1943 by Warren McCulloch and Walter Pitts. Lately, Frank Rosenblatt, an American psychologist, and computer scientist built the first Perceptron machine when doing research at Cornell Aeronautical Laboratory for performing image recognition. This was not a program but a machine with 400 photocells and potentiometers. During this time the Perceptron machine become popular among the AI community and is considered the fundamental part of building Intelligent systems.

The perceptron algorithm is based on the concept of a single neuron in the human brain, A single neuron in the human brain is doing a very simple thing, just receiving some inputs and if the inputs are high, it activates and send the signal to the next neuron. The perceptron was designed to mimic this process, with the input data serving as the input to the neuron and the weights representing the strength of the connections between the input neurons and the output neuron. 

In the early days of artificial intelligence research, the perceptron algorithm was seen as a promising approach for building intelligent systems. However, it was later discovered that the perceptron algorithm has some limitations, such as its inability to learn certain types of non-linear patterns in the data. These limitations led to the development of other types of machine learning algorithms, such as Support Vector Machines(SVM), which are able to learn more complex patterns in the data. However, Perceptron is still considered one of the most important algorithms in Artificial Intelligence and Machine Learning since it is fundamental for some of the powerful algorithms we are using today.

The Perceptron Algorithm, How does it work?

The Perceptron is a type of linear classifier(binary classifier), which means it can be used to classify data that is linearly separable. A Perceptron is somehow similar to Logistic Regression at first glance, but it's different. While Logistic Regression predicts probabilities of a data point falling in a particular class, Perceptron will only tell whether the data point is in a particular class or not, Just like saying "Yes" or "No".

Here is the diagrammatic representation of the perceptron algorithm.

Perceptron(TLU) diagram


A Perceptron is a kind of single Artificial Neuron which is also known as a Threshold Processing Unit(TLU). As you can see in the above diagram, the Perceptron contains some input links X1, X2, and, X3. Each input has its own corresponding weights W1, W2, and W3. These weights are basically the hearts of Perceptrons which determine the strength of each input signal to it. 

The Perceptron or TLU computes the weighted sum of the inputs (z = X1W1 + X2W2 + X3W3 ...... + XnWn) and then these weighted sums of inputs are passed through an activation function also known as the step function. These activation functions will determine whether the Perceptron needs to be activated or not.

Let's see an example to understand this,


In the above example, we can see that there are three inputs given into a Perceptron, and then the weighted sum of inputs is calculated and we got 0.22. This is passed through an activation called the Heaviside activation function(Which we are going to discuss next). 

But you may be noticed that one of the inputs to the perceptron is zero, this might not be good sometimes since it will affect the training process. If you try to change the weights, it will not make any effect since the input is still zero. Here we need to add a new term to the equation which is known as bias. Bias will help to shift the activation to the left or right during the training of the Perceptron algorithm.

So the new term obtained will look like this,

z = (XW + bias)


Activation functions

Activation functions are mathematical functions that can be used in Perceptrons to determine the output given its input. As we said it determines whether the neuron(Perceptron) needs to be activated or not. Activation functions take in a weighted sum of the input data, called the activation, and produce an output that can be used for prediction.

Activation functions are an essential part of Perceptrons and neural networks because they allow the model to learn and make decisions based on the input data. They also help to introduce non-linearity into the model, which is necessary for learning more complex relationships in the data.

Some common types of activation functions used in Perceptrons are the Sign function, Heaviside function, Sigmoid function, ReLU function, etc.

Here Heaviside and Sign functions are commonly used with Perceptrons, so let's understand what these activation functions do,

Heaviside function

The Heaviside activation function will return 0 when the weighted sum of inputs is less than zero and return 1 if it is greater than or equal to 0.]

\[\text{heaviside(z)} = \begin{Bmatrix} 0 \ if \ z < 0 \ &  \\ 1 \ if \ z\geq 0 &  \\ \end{Bmatrix}\]

Sign function

The Sign function will return 0 if the weighted sum of inputs is 0 and return +1 and -1 when the weighted sum of inputs is greater and lesser than 0 respectively.

\[\text{sgn(z)} = \begin{Bmatrix} -1 \ if \ z < 0 \ &  \\ 0 \ if \ z=0 &  \\ +1 \ if \ z>0 \end{Bmatrix}\]

Learning in Perceptron

Alright, we have discussed how perception really works, but how it can be trained? The whole point of training the perceptron is to adjust the weights of each input according to the training data. For training the perceptron to find the best weights, Frank Rosenblatt proposed an algorithm that is largely inspired by Hebb's rule. Hebb's rule state that when a biological neuron triggers another neuron, the connection between these two neurons grows stronger. There is an awesome phrase for this property of neurons, ie, Neurons that fire together wire together.

Inspired by Hebb's rule, the Perceptron can be trained by finding the error made by the algorithm and finding the weights that reduce this error as much as possible. This is the main goal of training a machine learning algorithm in general. The algorithm will first make a prediction based on the training data given, then this prediction is compared with the actual desired output. The error is calculated for each of the training instances by comparing the prediction and actual output, finally, this error is reduced during the training process and the weights corresponding to the correct predictions are adjusted accordingly.

Perceptron training demo diagram

The learning(training) rule can be written as

\[w_{i, j}(next) = w_{i,j}+\eta (y_{i} - \hat{y})x_{i}\]

Where Wi, j is the weights between ith input neuron and jth output neuron
η(eta) is the learning rate
y is the target output or the actual values
ŷ is the predicted value of the output neuron
Xi is the input value of the current training instance

The algorithm is being trained by updating the weights until getting the optimal results. The learning rate(η) is required to determine the length of each step taken for finding optimal weights. Sometimes, it is better to have a smaller learning rate to get a minimum error, but sometimes a large learning rate is also preferred.

Implementing Perceptron in Python

Until now we have discussed what is Perceptron, its working, and the learning rule. Now let's tie all of them together using Python to see it in action.

We are building the Perceptron class from scratch only using NumPy and Scikit-learn to do the whole stuff. So first let's create the Perceptron class and initialize some of the instance variables needed,

import numpy as np

class Perceptron:
   
    def __init__(self, learning_rate, epochs):
        self.weights = None
        self.bias = None
        self.learning_rate = learning_rate
        self.epochs = epochs

Here we have initialized some instance variables including, weights, bias, learning rate, and epochs(iteration). Next, we are going to define the activation function method

    # heaviside activation function
    def activation(self, z):
        return np.heaviside(z, 0) # haviside(z) heaviside -> activation

The Heaviside activation method only takes one parameter, which is the weighted sum of inputs z, and returns the corresponding output.

Let's come to the main section of training Perceptron, 

    def fit(self, X, y):
        n_features = X.shape[1]
       
        # Initializing weights and bias
        self.weights = np.zeros((n_features))
        self.bias = 0
       
        # Iterating until the number of epochs
        for epoch in range(self.epochs):
           
            # Traversing through the entire training set
            for i in range(len(X)):
                z = np.dot(X, self.weights) + self.bias # Finding the dot product and adding the bias
                y_pred = self.activation(z) # Passing through an activation function
               
                #Updating weights and bias
                self.weights = self.weights + self.learning_rate * (y[i] - y_pred[i]) * X[i]
                self.bias = self.bias + self.learning_rate * (y[i] - y_pred[i])
               
        return self.weights, self.bias

What happens here is really simple, first, we'll find the number of features in the training instance for assigning the weights. Then the initial weights and biases are assigned randomly. After that, we find the weighted sum of inputs and passed through the Heaviside activation function. Finally, the weights and bias are updated in each case and the optimal values are returned.

Prediction method

    def predict(self, X):
        z = np.dot(X, self.weights) + self.bias
        return self.activation(z)

That's it, we are done with the Perceptron class, So let's do the main part which is to classify the Iris dataset

Classifying Iris dataset using Perceptron

The iris data consisted of 150 samples of three species of Iris including Setosa, Versicolor, and Virginica. The first column of the dataset represented sepal length, the second column represented sepal width, the third column represented petal length, and the fourth column represented petal width. For this classification purpose, we are only using the petal length and petal width

If you want to know more about Iris classification check this article: Iris classification using python
or type "Iris dataset classification using python" on google.

Loading the dataset

from sklearn.datasets import load_iris

iris = load_iris()

Splitting the dataset

from sklearn.model_selection import train_test_split

X = iris.data[:, (0, 1)] # petal length, petal width
y = (iris.target == 0).astype(np.int)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)

Here we are only considering petal length and petal width, So we transformed the training data which only contains the petal length and width and not sepal. After that, the X and y are split into training and testing sets.

Training and making predictions

Alright, now let's train our Perceptron algorithm,

perceptron = Perceptron(0.001, 100)

perceptron.fit(X_train, y_train)

pred = perceptron.predict(X_test)

Now let's see how much accuracy we have got,

from sklearn.metrics import accuracy_score

accuracy_score(pred, y_test)

-------

0.96

That's great we got an accuracy of 96%, changing the learning rate or the number of epochs will result in more accurate results.

Classification report

from sklearn.metrics import classification_report

report = classification_report(pred, y_test, digits=2)
print(report)

------

                precision    recall  f1-score   support

         0.0       0.93      1.00      0.97        43
         1.0       1.00      0.91      0.95        32

    accuracy                           0.96        75
   macro avg       0.97      0.95      0.96        75
weighted avg       0.96      0.96      0.96        75

Classifying Iris dataset using Scikit-learn Perceptron class

That's nice we have implemented our own Perceptron algorithm from scratch, but why not also try the Scikit-learn Perceptron class.

from sklearn.linear_model import Perceptron


sk_perceptron = Perceptron()
sk_perceptron.fit(X_train, y_train)
sk_perceptron_pred = sk_perceptron.predict(X_test)

# Accuracy

accuracy_score(sk_perceptron_pred, y_test)

-----

0.88

Some limitations and how they can be solved?

Even though Perceptrons are considered great by researchers, they also find some limitations, First of all, it is a binary classifier, so they cannot be used for learning really complex data like natural language processing, speech recognition, etc. Moreover, most of the Linear models cannot solve the XOR problem and Perceptrons are also one of them. But scientists and researchers expected more from Perceptrons, however, it was a great disappointment for them for some years. But lately, after doing more research, scientists found that stacking multiple Perceptrons can solve most of the complex problems and the XOR problem as well. This results in an Artificial Neural Network known as Multilayer Perceptrons(MLP). MLP shows great success in solving more complex problems and learning really complex patterns in the training data. 

The XOR problem,


How MLP solves the XOR problem,


We'll discuss Multilayer Perceptrons in detail in the next article, 

Conclusion

In conclusion, the perceptron algorithm is a simple and efficient linear classifier that is used to predict the output class of an input based on a linear combination of its features. It uses an activation function, to transform the weighted sum of the input features into an output that can be used for prediction. Perceptron algorithms are widely used in many applications, including natural language processing and image classification, but they have some limitations, such as their inability to model complex relationships between the input features and the output class. Despite these limitations, perceptron algorithms are still a valuable tool for solving many machine learning problems.

A huge part of this article is referenced from the book, "Hands-on Machine Learning with Scikit-learn, Keras, Tensorflow" if you want to know more check this out, and surely this book will be a great resource for your Machine Learning books collection. 

Author: Sidharth