## Introduction

Support vector machines are among the most powerful and practical machine learning algorithms ever designed. They can be used to predict outcomes of marketing campaigns, predict user behavior on websites, accurately predict results in medical tests, and much more. In this article, we’re going to focus on the concept of a primal support vector machine (SVM) in detail including its math and crazy derivations. So let's get into it.

## What is a Primal Support Vector Machine?

The thing you need to understand is that SVM is so powerful that it can even be used in more than 2-dimensions. In some cases, the data points encountered in a problem can be linearly separable or non-linearly separable,

If the data points of classes are linearly separable, we can simply formulate the optimization function using the basic SVM which is known as the Primal formulation of SVM. But when the data points are not linearly separable the Primal formulation simply doesn't work, Here we need to use something known as the Dual Form of SVM that deals with multiple dimensions. But in this article, we are focusing on the basic concept of SVM, ie, the Primal formulation of SVM.

## Some important SVM terms

SVM Diagram |

### Hyperplane

### Support Vectors

### Marginal Distance

## What is the goal of SVM?

## Separating Hyperplane

**y = mx + b,**where

**m**is the slope, and

**b**is the intercept. But in SVM we need to formulate the same for the marginal plane and for both negative and positive hyperplanes as shown in the figure (SVM Diagram);

**w transpose**so that we'll get the equation for

**(x1 - x2),**But since

**w transpose**is a vector and it has a direction we need to divide the equation with the

**norm**of

**w,**

**y**value as +1, and whenever it is less than or equal to -1 we need to consider the

**y**value as -1. This equation that needs to be maximized is known as the

**regularizer.**

## Soft margin SVM

**regularizer**can also be rewritten as follows:

**number of errors in the training**(

**C**) and the

**sum of the value of error( Σζ )**. So the optimization term will be:

*Soft Margin SVM*## Loss Function: Soft Margin SVM

**ŷ**for a given

**x**and labels

**y,**we can consider the error to be zero when both of them match else one. This approach is known as

**zero-one**loss. But this loss function is used in

**Combinatorial Optimization**problems which is the subfield of mathematical optimization. Using this loss in SVM is really challenging.

**Hinge Loss**function:

**t**≥ 1, f(x), ie, the hinge loss will be zero for correct prediction. If

**t**≤

**0,**the hinge loss returns a value of 1 or larger. By using this hinge loss function it gives an unconstrained optimization term:

**w**and

**b**which we'll discuss in the next article.