A Detailed Overview to the Basics of Support Vector Machines in Machine Learning

This article will give a detailed overview of the Basics you need to know about Support Vector Machine algorithms in machine learning

While there are countless machine learning algorithms to choose from, most models fall into one of two categories: supervised and unsupervised. Supervised machine learning involves training an algorithm with real data that comes with pre-defined target values, while unsupervised algorithms are trained with raw data that has no pre-defined target value. Support vector machines are a type of supervised machine learning algorithm, but they’re unique in that they don’t require us to pre-define the target value beforehand. This article will give a detailed overview of the basics you need to know about support vector machine algorithms in machine learning.

An Introduction to SVM

SVM was first introduced in the early 1980s by Vladimir Vapnik and Alexey Chervonenkis. Unlike other classification algorithms, SVMs don't require any pre-processing or normalization of the data set. The idea is that each observation is a point in space, so there are two different types of points: support vectors and non-support vectors. The classification boundary line separates these two types of points on either side. A support vector is any point that has a higher value than the other points on one side but has a lower value than the other points on the opposite side. To classify an observation, it is simply determined if it lies closer to one type or another, which can be done by taking its distance from the boundary line for example.

Support vector machines have been proven to be a very effective machine learning model for many types of classification problems. They're fast, robust, scalable, and don't require many computational resources. Their drawbacks are that they can have trouble with non-linear data sets and they are sensitive to the choice of parameters so it’s important to tune them properly. However, there is a huge amount of research into how to find good parameter settings (hyperparameters) for SVM models which is a topic for another post. There are many packages that support SVM out there such as LIBSVM (C++) or LibSVM (Python).

Support Vector Machine Representation
Support Vector Machine Graphical Representation

Some important terms regarding SVM

Decision Boundary

A decision boundary is a line or surface that separates different regions in a data space. Each region is associated with a class label, and the decision boundary is used to predict the class label of new data points.

In the Support Vector Machine (SVM) algorithm, the decision boundary can be a line or plane that separates the data space into two regions, one associated with the class label +1 and the other with the class label -1. The SVM algorithm finds the optimal decision boundary that maximizes the margin between the two regions.

Hyperplane

A hyperplane is a decision boundary that can be used to classify data. In the support vector machine algorithm, a hyperplane is created by finding a line that best separates the data. When the decision boundary is more than 2-dimensional, it is called a hyperplane.

Support Vectors

The support vectors in a support vector machine algorithm are the data points that lie closest to the separating line between the two classes. These vectors are used to calculate the optimal decision boundary for a classifier. When performing classification using the Support Vector Machine algorithm, the algorithm finds the hyperplane that best separates the two classes of data, and the support vectors are the points on the hyperplane that is closest to the points in the training data.

Marginal Distance

The marginal distance in the Support Vector Machine algorithm is the distance between the closest point in the training data set and the decision boundary. Increasing the margin distance can lead to improved generalization performance, ie,  better out-of-sample prediction.

Linear Separability

Linear separability is a term used in mathematics, statistics, and machine learning to describe a set of data points that can be split into two groups using a single line. A line is said to be a linear separator for a set of data points if the points can be divided into two groups such that all points in one group are on one side of the line and all points in the other group are on the other side.

Non-linear separability

Non-linear separable data is data that cannot be separated by a linear line. This type of data is more difficult to work with because it cannot be easily separated into groups. Support vector machine algorithm is a type of machine learning algorithm that can be used to classify non-linear separable data. This algorithm works by finding a line that best separates the data into two groups.

Kernal Function

Kernel functions are used in the Support Vector Machine algorithm to map data into a higher dimensional space so that data points can be separated more easily. Kernel functions can be linear or non-linear.  Simply Kernal functions are used when the data is not linearly separable in lower dimensional space.

How Does SVM Work?

Support Vector Machine works by constructing a hyperplane between the two classes of data, and then finding the best line that separates them. The best line that separates the data is the one that has the most distance from all the data points in one class and the least distance from all the data points in the other class.

To find this line, SVM uses a technique called “linear regression”. Linear regression finds the best line that fits a set of data points. It does this by trying to find the line that has the smallest “error”. Error is just a fancy word for the distance between a data point and the line. 

Then SVM uses a technique called “support vectors” which is illustrated in the graph. Support vectors are nothing but the data points that lie on the hyperplane. They are the most important data points because they help to define the hyperplane that can separate the data points.

The decision boundary or decision hyperplane including all the support vectors is the one that maximizes the margin between the two classes. In fact, the main aim of SVM is to increase the marginal distance(as shown in the graph) as much as possible so as to reduce the generalization error. This means increasing the marginal distance will improve the performance of the model on unseen data. So here what happens is that when the margin distance increases the possibility of predicting the wrong class will decrease since the decision boundary is now far from the points of the two separate classes.

So when considering a new data point, if it lies near to negative hyperplane(shown on the graph) it is more likely to belong to the negative class and if it lies near the positive hyperplane it is more likely to belong to the positive class.

When training the SVM  model, we first need to find the support vectors. Then we need to find the line that maximizes the margin between the two classes. This line is the decision boundary.

Once we have the decision boundary, we can use it to predict the class of new data points. To do this, we just need to see which side of the decision boundary the new data point falls on. If it falls on the side of the class that we’re trying to predict, then we predict that class for the new data point.

Applications of SVM

  • Some applications of SVM include:
  • Sentiment analysis: A support vector machine can be used to classify the sentiment of a text as positive, negative, or neutral.
  • Spam detection: Can be used to classify emails as spam or not spam.
  • Image recognition: Can be used to identify objects in images.
  • Fraud detection: Can be used to identify fraudulent activity.
  • Stock prediction: Can be used to predict future stock prices.
  • Weather prediction: Can be used to predict future weather patterns.
  • Disease diagnosis: Can be used to diagnose diseases like cancer.

Conclusion

So in this article, we've learned the basics of SVM, including the introduction, basic terms in SVM, the working of SVM, and some Applications of SVM. The complete explanation including math and implementation in python will be in future posts.