Handwritten digit recognition on MNIST dataset using python



In this article, we are familiarizing the classification techniques in machine learning to build a machine learning model for predicting the handwritten digits of different kinds.

Classification can be binary or multiclass, Binary classification comprises of two values either true or false, i.e., the binary classifier only distinguishes between two classes. Whereas binary classifiers work between two classes, multiclass classifiers can distinguish more than two classes.

This model can build using multiclass classification algorithms such as Decision trees, Random forest, SVM, Logistic Regression, KNN, Naive Bayers, etc. For this model, we are using the Support Vector Machine(SVM) algorithm.

Initially, we need to load the dataset to work on. Scikit-learn provides many functions to download popular datasets. Here we are using the MNIST dataset. 

What is the MNIST dataset?

The MNIST dataset is a collection of 70,000 small images of digits handwritten by school students and employees of the US Central Bureau. Each of these images has its own corresponding labels in the dataset. 

So now you have an idea of the MNIST dataset. Let's fetch the dataset first.

#loading the dataset

from sklearn.datasets import fetch_openml

mnist = fetch_openml('mnist_784',version=1)

mnist.keys()


dict_keys(['data', 'target', 'frame', 'feature_names', 'target_names', 'DESCR', 'details', 'categories', 'url'])

The dataset loaded has generally had a similar dictionary structure including:
  • DESCR key describing the dataset.
  • Data key consists of an array with one row per instance and one column per feature.
  • The target key contains an array of corresponding labels as strings.

mnist['data']


array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])
mnist['target']

array(['5', '0', '4', ..., '4', '5', '6'], dtype=object)
X, y = mnist['data'], mnist['target']
X.shape
(70000, 784)
y.shape
(70000, )

The dataset is mnist_784 version 1. This dataset has 70,000 images each with 784 features since each image is 28 x 28 pixels and each feature represents one pixel's intensity.

Now just plot some of them to know what they look like.

X, y = mnist['data'], mnist['target']
X.shape
(70000, 784)
y.shape
(70000, )

import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.image as img

some_digit = X[0]
some_digit_image = some_digit.reshape(28, 28)

plt.imshow(some_digit_image, cmap=mpl.cm.binary, interpolation='nearest')
plt.axis("off")
plt.show()


Splitting train and test data

X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]
 
The training set is already shuffled for us,. This is quite useful when comes to cross-validation since the cross-validation sets would be similar. Moreover, some algorithms will perform poorly when it gets similar instances in a row. So shuffling the dataset ensures it won't happen.

Training the SVM Classifier

# Training the SVM classifier

from sklearn.svm import SVC

svm_clf = SVC(decision_function_shape = 'ovo')

svm_clf.fit(X_train, y_train)


SVC(decision_function_shape='ovo')
# Prediction in training

training_prediction = svm_clf.predict(X_train[:5])
print("Prediction in training:",training_prediction)
print("Actual values:",y_train[:5])


Prediction in training: [5 0 4 1 9]
Actual values: [5, 0, 4, 1, 9]

The predicted and actual values are quite similar. However, predicting the right values is not the best way to evaluate a model's performance.

Testing the SVM Classifier

# Prediction in testing

testing_prediction = svm_clf.predict(X_test[:5])
print("Prediction in testing:",test_prediction)
print("Actual values:",y_test[:5])

Prediction in testing: [7 2 1 0 4]
Actual values: [7, 2, 1, 0, 4]

Hmm! not bad, the model is being able to predict the exact values when we tested with the first 5 values in the dataset. But hold on we need some more precise measures to check the model performance in the meantime. 

Evaluating the Classifier

A better way to evaluate the performance of a classifier is to look at the confusion matrix. The main usage of the confusion matrix is to identify how many of the classes are misclassified by the classifier. For instance, to know the number of times the classifier misclassifies 1 and 7, you can look at the 7th row and 1st column of the confusion matrix.  


To compute the confusion matrix, we need to have some predicted values so that we can compare them with the actual values. Here we are finding the confusion matrix of a single value that the model predicted.

y_train_5 = (y_train==5)
y_test_5 = (y_test==5)



from sklearn.model_selection import cross_val_predict

y_train_predict = cross_val_predict(svm_clf, X_train, y_train_5, cv=3)



from sklearn.metrics import confusion_matrix

confusion_matrix(y_train_5, y_train_predict)


array([[54497,    82],
       [  312,  5109]], dtype=int64)

The cross_val_predict() function performs K-fold cross-validation and returns the predictions made by the classifier on each test fold. In the confusion matrix, each row represents the actual values and each column represents a predicted class. Here, we are going to predict 5, the first row of the matrix represents non-5 images(negative class), 54497 were correctly classified as non-5s(true negatives) while the remaining 82 are wrongly classified as 5s(false positives). The second row in the matrix represents 5s images(positive class), 312 were wrongly classified as non-5s (false negatives), while the remaining 5109 were classified correctly as 5s(true positives). So, a perfect classifier only has true positives and true negatives.

The confusion matrix can give a lot of information, but sometimes we need more precise metrics. Precision is the accuracy of positive predictions or the number of most relevant values from retrieved values.  

Precision = TP / TP + FP

Meanwhile, Recall(sensitivity) is the ratio of positive instances that are truly detected by the classifier.

Recall = TP / TP + FN

Combining precision and recall into a single metric is called the F1 score. The F1 score is the harmonic mean of precision and recall.

F1score = TP / TP + (FN + FP / 2)

from sklearn.metrics import precision_score, recall_score, f1_score

precision = precision_score(y_train_5, y_train_predict)
recall = recall_score(y_train_5, y_train_predict)
f1score = f1_score(y_train_5, y_train_pred)

print("Precision:",precision)
print("Recall:", recall)
print("f1_score:",f1score)


Precision: 0.9842034290117511
Recall: 0.9424460431654677
f1_score: 0.9628722201281568

The SVM binary classifier has a good precision, recall, as well as f1 score. The model is 98.4% accurate when it claims the image as 5 and it detects 94.2% of the 5s from the overall images. The classifier will get a high f1 score if its precision and recall are high.