Regression with Multilayer Perceptron(MLP) Using Python

In this article, we are going to understand how Multi-Layer Perceptrons can be used for Regression tasks and modeling.

Introduction

Multilayer-Perceptron(MLP) is one of the fundamental and early Neural Networks also known as Plain Vanilla Neural Networks. It is classified as a feed-forward network, indicating that data flows only in one direction, forward, unlike other neural networks, such as Recurrent Neural Networks (RNNs), that can process sequential data. MLP is quite popular in tasks like Image classification, Natural Language Processing(NLP), time series forecasting, etc. At the core, these tasks basically come under the category of classification where the model categorizes different classes based on what it learned. But, Neural Networks are an all-rounder, They can also be used for Regression tasks which helps to predict continuous outcomes. In this article, we are going to understand how Multi-Layer Perceptrons can be used for Regression tasks and modeling and What makes them different from other Regression Models, So, let's begin.

What is Regression?

So, what basically is Regression? Regression is a statistical technique that is used for predicting continuous outcomes. It is one of the most popular statistical methods which is used for different kinds of purposes in different kinds of fields, For example, Suppose you are a businessman and like to improve your business and reach more audience to improve revenue potential, you may be interested in studying the impact of advertising on your business. By collecting data on how much you spend on advertising and how much revenue you earn over a given period of time, you can use regression analysis to estimate the relationship between advertising and revenue. This will allow you to make predictions about the revenue you can expect to earn based on different levels of advertising spending and help you make informed decisions about how to allocate your marketing budget to maximize your returns.

In the case of Regression, you need to note two terms, dependent variable, and independent variable. In the context of Advertising and Revenue, we can say that Revenue earned is a dependent variable and the amount spent on Advertising(different types of Advertising techniques) is the independent variable. But why is that the case and what does it mean? Basically, there is a relationship between Advertising and Revenue(When Advertising increases, Revenue Improves). The dependent variable, revenue earned, is the outcome of interest, and it is dependent on the amount spent on advertising, which is the independent variable. So the Revenue earned is dependent on the Advertising, but, Advertising is independent of the revenue earned. It can be controlled and changed as per our requirements. This is one of the important concepts in Regression Analysis.

Linear Regression

You might be heard about Linear Regression when you started your journey toward Machine Learning because it is known as the hello world algorithm of machine learning. The basic idea behind linear regression is to fit a straight line using the basic slope-intercept equations. It is a popular Regression method that can be used in datasets with more linearity. Here is what it looks like.


Non-linear Regression

Regression Problems may not be always linear in nature, such type of Regression is known as Non-linear Regression. Here, there might be more complex relationships between variables it might be an exponential, logarithmic, or some other type of non-linear relationship. Unlike Linear Regression, non-linear regression models can be used to capture more complex relationships, such as curves or other non-linear shapes. 

Non-linear regression models can be more challenging to fit and interpret than linear regression models. The form of the non-linear function must be specified, and the parameters of the function must be estimated from the data.

How MLP Can Be Used for Regression Problems?

Multi-layer perceptrons(MLPs) are really good at learning complex relationships that exist in a given dataset. So, it can be used for solving non-linear regression problems. As we know, MLPs consist of multiple layers of interconnected nodes, or neurons, Each neuron in a layer receives input from the previous layer, applies a transformation function to the input, and passes the output to the next layer. Basically, there are three different layers, the input layer, the hidden layer, and the output layer. The hidden layers can be of any size.

The classification MLP and Regression MLP are not too different, but there are a few differences of course. First, If you want to predict a single value(Revenue earned because of Advertising), you only need a single output neuron and if you want to predict multiple values, you can add multiple output neurons. 

Second, In general, we don't apply any activation function to the output layer of MLP, when dealing with regression tasks, It just does the weighted sum and sends the output. But, in case you want your value between a given range, for example, -1 or +1 you can use activation like Tanh(Hyperbolic Tangent) function.

Third, The loss functions that can be used in Regression MLP include Mean Squared Error(MSE) and Mean Absolute Error(MAE). MSE can be used in datasets with fewer outliers, while MAE is a good measure in datasets which has more outliers.

Performing Regression using MLP

We are going to implement all sorts of things we have discussed. Our plan is to generate a random dataset using NumPy and then evaluate how the neural network is able to identify patterns from that dataset. This will help us to learn how MLP is handling regression non-linear tasks. Next, we will work with a real-world dataset that depicts the correlation between advertising and sales.

Building dataset using NumPy

from sklearn.datasets import make_regression
import numpy as np
import matplotlib.pyplot as plt

X_train = np.linspace(-10, 10, 1000)
y_train = np.sin(X_train) + np.random.normal(0, 0.2, size=X_train.shape)

X_test =  np.linspace(-10, 10, 500)
y_test =  np.sin(X_test) + np.random.normal(0, 0.2, size=X_test.shape)

Now, if you plot the dataset, you'll see something like this,

plt.scatter(X_train, y_train)
plt.show()




The training dataset looks like a wavey pattern, you'll get the same pattern when you plot the testing data as well, but not exactly the same data points. This type of dataset is not good for a Simple Linear Regression Model to predict with low MSE because it is non-linear, but, let's see what Neural Networks can do.

Creating and Training MLP Model

Let's create our Neural Network, I'm going to use the TensorFlow version of Keras to create the network. You can use the standard Keras version without any problem.

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(62, activation='relu', input_dim=1))
model.add(tf.keras.layers.Dense(62, activation='relu'))
model.add(tf.keras.layers.Dense(1, activation='linear'))

We created a model which has 1 input layer since we only have a single feature, and then the first and second hidden layers have 62 neurons each. Well, it is completely up to you to select the number of neurons. Here I selected  62 because it performs well. now let's compile & train the model.

model.compile(loss='mse', optimizer='adam', metrics=['mae'])
history = model.fit(X_train, y_train, epochs=500)

Nothing seems unfamiliar here, we discussed MSE and MAE, and the other argument is the optimizer, which is I selected "Adam".

Plotting the Regression Curve

Here is how you plot the Regression Curve formed when training the model,

plt.scatter(X_train, y_train)
plt.plot(X_train, model.predict(X_train), color='red')
plt.show()


The red line curve in the graph is known as the regression curve formed by our MLP. This is really cool, the network actually finds the wavey pattern of the dataset, and this is how basically regression works with MLP. Now, let's plot the testing data as well,

pred = model.predict(X_test)

plt.scatter(X_test, y_test)
plt.plot(X_test, pred, color='red')
plt.show()


Plotting is a great way to evaluate the performance of the model, but it alone doesn't give a good idea about the efficiency of regression models, so, it is good the check the MSE on testing the model.

from sklearn.metrics import mean_squared_error

pred = model.predict(X_test)

mean_squared_error(pred, y_test)

---------

0.04299296078870944

That's not really bad, however, if you can reduce the MSE as much as lower, you'll get more effective models. You can also try Linear Regression Model to compare.

Predicting Sales on Advertising Dataset

Ok, let's get into some real-world examples, here is the dataset from Kaggle. This dataset contains the different forms of advertising and the effects of advertising on sales. You can download the dataset from Kaggle and use Pandas to load it.

import pandas as pd

dataset = pd.read_csv('Advertising Budget and Sales.csv')

Splitting the dataset

Here is how you can split the dataset into train and test sets,

from sklearn.model_selection import train_test_split

X = dataset.iloc[:, 1: 4]
y = dataset.iloc[:, -1]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Plotting the dataset

import matplotlib.pyplot as plt

plt.scatter(X_train['TV Ad Budget ($)'], y_train, label='TV Ad Budget')
plt.scatter(X_train['Radio Ad Budget ($)'], y_train, label='Radio Ad Budget')
plt.scatter(X_train['Newspaper Ad Budget ($)'], y_train, label='Newspaper Ad Budget')
plt.xlabel('Advertising Budget ($)')
plt.ylabel('Revenue ($)')
plt.legend()
plt.show()

Building and Training MLP

import tensorflow as tf

model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(62, input_shape=(3,), activation='relu'),
    tf.keras.layers.Dense(62, activation='relu'),
    tf.keras.layers.Dense(1)
])


model.compile(optimizer='adam', loss='mse', metrics=['mae'])

Let's train it.

early_stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)
history = model.fit(X_train, y_train, epochs=1400, validation_data=(X_test, y_test))

To prevent overfitting in the model, an early stopping callback has been added. Since the dataset is small, there is a chance of overfitting, where the model becomes too good at the training data and performs poorly on unseen data. By monitoring the training and validation loss, you can assess if overfitting is occurring.

If the training loss decreases substantially while the validation loss remains the same or starts increasing, it indicates overfitting. The early stopping callback helps address this issue by stopping the training process when the validation loss doesn't show significant improvement over epochs.

After completing the training process, the model achieved a validation accuracy of approximately 0.6 and a training accuracy of 0.2. Although these results are favorable, it's important to note that they are not perfect since this is only for demonstrating how Regression MLP works

Plotting the learning curve,

# Plot the loss curve
plt.plot(history.history['loss'], label='Training Loss')
plt.legend()
plt.show()


You can also plot the predictions made by the network on each of the Advertising forms, like TV, Radio, and Newspaper, here is how you can do it,

Plotting TV Ad Budget vs Sales.

y_pred_test = model.predict(X_test)

plt.scatter(X_test['TV Ad Budget ($)'], y_test, color='blue')

# Scatter plot of predictions (red)
plt.scatter(X_test['TV Ad Budget ($)'], y_pred_test, color='red')

# Plot the regression line (green)
#plt.plot(X_test['TV Ad Budget ($)'], y_pred_test, color='green')

# Set the axis labels and title
plt.xlabel('TV Ad Budget ($)')
plt.ylabel('Sales ($)')
plt.title('Neural Network Regression - TV Ad Budget vs Sales')

# Add a legend to the plot
plt.legend(['True Values', 'Predictions'])
plt.show()

Plotting Radio Ad Budget vs Sales

plt.scatter(X_test['Radio Ad Budget ($)'], y_test, color='blue')

# Scatter plot of predictions (red)
plt.scatter(X_test['Radio Ad Budget ($)'], y_pred_test, color='red')

# Plot the regression line (green)
#plt.plot(X_train['TV Ad Budget ($)'], y_pred_train, color='green')

# Set the axis labels and title
plt.xlabel('Radio Ad Budget ($)')
plt.ylabel('Sales ($)')
plt.title('Neural Network Regression - Radio Ad Budget vs Sales')

# Add a legend to the plot
plt.legend(['True Values', 'Predictions'])
plt.show()

Plotting Newspaper Ad Budget vs Sales

plt.scatter(X_test['Newspaper Ad Budget ($)'], y_test, color='blue')

# Scatter plot of predictions (red)
plt.scatter(X_test['Newspaper Ad Budget ($)'], y_pred_test, color='red')

# Plot the regression line (green)
#plt.plot(X_train['TV Ad Budget ($)'], y_pred_train, color='green')

# Set the axis labels and title
plt.xlabel('Newspaper Ad Budget ($)')
plt.ylabel('Sales ($)')
plt.title('Neural Network Regression - Newspaper Ad Budget vs Sales')

# Add a legend to the plot
plt.legend(['True Values', 'Predictions'])
plt.show()