Understanding Support Vector Machine Kernels can be challenging, especially if you're just starting out with data science in general. But never fear! This article will provide you with an introduction to SVM Kernels especially polynomial kernels, as well as walk you through how to use them in Python from scratch using Pandas, and NumPy. So let's get started
SVM Kernels: Polynomial Kernel - From Scratch Using Python.
SVM is an algorithm that has shown great success in the field of classification. It separates the data into different categories by finding the best hyperplane and maximizing the distance between points. To this end, a kernel function will be introduced to demonstrate how it works with support vector machines. Kernel functions are a very powerful tool for exploring high-dimensional spaces. They allow us to do linear discriminants on nonlinear manifolds, which can lead to higher accuracies and robustness than traditional linear models alone.
The kernel function is just a mathematical function that converts a low-dimensional input space into a higher-dimensional space. This is done by mapping the data into a new feature space. In this space, the data will be linearly separable. This means that a support vector machine can be used to find a hyperplane that separates the data.
For example, if the input 𝑥 is two-dimensional, the kernel function will map it into a three-dimensional space. In this space, the data will be linearly separable.
In addition, they provide more features than those of other algorithms such as neural networks or tree ensembles in some kinds of problems involving handwritten recognition, face detection, etc because they extract intrinsic properties of data points through a kernel function.
Kernels are useful because they can be used to separate data that is not linearly separable. When there is a case where the data cannot be separated using a basic SVM algorithm, we can use the kernel trick, which allows more accurate results since the data is being converted to a higher dimension which leads to a new extra dimension for the data to be spread.
Kernels are also useful because they can be used to decrease the errors of the SVM algorithm. The reason for this is that the kernel function can map the data into a higher-dimensional space. In this space, the data will be more linearly separable. This means that the SVM algorithm will be able to find a hyperplane that separates the data with higher accuracy and lower errors.
There are many different kernel functions that can be used. Some of the most common kernel functions are the polynomial kernel, the RBF kernel, and the sigmoid kernel.
A polynomial kernel is a kind of SVM kernel that uses a polynomial function to map the data into a higher-dimensional space. It does this by taking the dot product of the data points in the original space and the polynomial function in the new space.
In a polynomial kernel for SVM, the data is mapped into a higher-dimensional space using a polynomial function. The dot product of the data points in the original space and the polynomial function in the new space is then taken. The polynomial kernel is often used in SVM classification problems where the data is not linearly separable. By mapping the data into a higher-dimensional space, the polynomial kernel can sometimes find a hyperplane that separates the classes.
The polynomial kernel has a number of parameters that can be tuned to improve its performance, including the degree of the polynomial and the coefficient of the polynomial.
For degree d polynomials, the polynomial kernel is defined as:
where c is a constant and x1 and x2 are vectors in the original space.
The parameter c can be used to control the trade-off between the fit of the training data and the size of the margin. A large c value will give a low training error but may result in overfitting. A small c value will give a high training error but may result in underfitting. The degree d of the polynomial can be used to control the complexity of the model. A high degree d will result in a more complex model that may overfit the data, while a low degree d will result in a simpler model that may underfit the data.
When a dataset is given containing features x1 and x2, the equation can be transformed as:
The important terms we need to note are x1, x2, x1^2, x2^2, and x1 * x2. When finding these new terms, the non-linear dataset is converted to another dimension that has features x1^2, x2^2, and x1 * x2.
Alright, now let's do the practical implementation of the polynomial kernel in python. For this demo, we need a random dataset. So let's create a non-linearly separable dataset using sklearn
Now let's plot the dataset
This dataset is not linearly separable since the two classes are intermixed. Here the basic linear SVM will not be able to classify this dataset with high accuracy. So we need to transform this 2D dataset into 3 dimensions.
In the previous article, we implemented the SVM algorithm from scratch in python, here is the link to the article: Implementing Support Vector Machine Algorithm from Scratch in Python
Now we are going to modify this algorithm for supporting the polynomial kernel function
To bring polynomial features to our SVM algorithm we need to add two things, a new parameter kernel to specify which type of kernel to use and the method that transforms the dataset from a lower dimension to a higher dimension.
The new method transform_poly will look like this:
Here we are finding the square and product of features x1 and x2 as we discussed in the equation above
and then converting the panda's data frame to a NumPy array. Finally, the resulting value of X and Y is returned. If there are no Y values only X is returned.
Inside the fit method, we can add the transform_poly method just like this:
So if we specified SVM with the kernel poly, the data X and Y is transformed from lower dimension to higher dimension
Inside the predict method, we can do the same:
That's it! our SVM algorithm is good to go for doing classification.
As you can see that the accuracy is below 0.5 when trying to do a linear fit, the accuracy is pretty low in this case, so let's go ahead and train the SVM with a polynomial kernel.
That's nice! the accuracy of the model increased from 0.48 to 0.76 when doing a polynomial fit instead of a linear fit.
Before transforming the dataset:
After transforming the dataset
Articles to Read: