[5] Non-linear Troubleshooting with Kernel SVM

2021. 1. 4. 14:05English/Machine learning algorithm

728x90
반응형

Algorithms such as linear SVMs and regression cannot distinguish classes that are distinguished by nonlinearity.

 

Linearly separated data
linearly indistinguishable data

 

Using kernel methods using the mapping function\(\phi\) can solve nonlinear problems.

 

Using the mapping function, the nonlinear combination of the original characteristics can be projected into a linearly differentiated high-dimensional space, where the hyperplane is distinguished and then turned back into the original characteristic space to distinguish nonlinear decision regions.

 

Ex) \(\phi(x_1,x_2) = (z_1,z_2,z_3) = (x_1,x_2,x_1^2+x_2^2)\)

Decision regions can be defined by a projection process of '2D -> 3D -> 2D'.

 

 

 

However, the problem with mapping functions is that the computational cost of creating new characteristics is very expensive. Calculation costs become more expensive, especially for high-dimensional data. To solve this problem, we use a kernel function consisting of dots between two points.

 

kernel function : \(K(x^{(i)}, x^{(i)}) = \phi(x^{(i)})^{T}\phi(x^{(j)})\)

 

 

 

One of the most frequently used kernels is the radiation base function (Gaussian kernel).

 

\(K(x^{(i)}, x^{(i)}) =exp\left(-\frac{||x^{(i)} - x^{(j)}||^2}{2\sigma^2}\right) = exp(-\gamma||x^{(i)} - x^{(j)}||^2)\)

* \(\gamma =\frac{1}{2\sigma^2}\)

 

\(\gamma\) is a parameter that serves to limit the size of the Gaussian sphere. Increasing the \(\gamma\) value reduces the influence and scope of the support vector. As a result, the decision regions become a little closer to the sample and more winding.

 

 

The term kernel can also be interpreted as a similarity function between samples. Negative signs translate distance measurements into similarity scores. The similarity score obtained through the exponential function will be between 1 and 0.

 

 

 

Using SVM to solve nonlinear problems

 

Load Library

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap

from sklearn.svm import SVC
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

 

Create non-linear data

 

np.random.seed(1)

# Create nonlinear dataset
X_xor = np.random.randn(200, 2)
y_xor = np.logical_xor(X_xor[:, 0] > 0,
                      X_xor[:, 1] > 0)
y_xor = np.where(y_xor, 1, -1)

 

Draw scatter plot graphs

# scatter plot
plt.scatter(X_xor[y_xor == 1, 0],
           X_xor[y_xor == 1, 1],
           c = 'b', marker = 'x',
           label = '1')
plt.scatter(X_xor[y_xor == -1, 0],
           X_xor[y_xor == -1, 1],
           c = 'r', marker = 's',
           label = '-1')
plt.xlim([-3, 3])
plt.ylim([-3, 3])
plt.legend(loc = 'best')
plt.tight_layout()
plt.show()

 

Create Decision Regions Graph Function

# define function about visualizing decision_regions
def plot_decision_regions(X, y, classifier, test_idx = None, resolution = 0.02):
    
    # set marker and colormap
    markers = ('s', 'x', 'o', '^', 'v')
    colors = ('red', 'blue', 'lightgreen', 'cyan')
    cmap = ListedColormap(colors[:len(np.unique(y))])
    
    # draws a decision boundary.
    x1_min, x1_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    x2_min, x2_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution),
                          np.arange(x2_min, x2_max, resolution))
    Z = classifier.predict(np.array([xx1.ravel(), xx2.ravel()]).T)
    Z = Z.reshape(xx1.shape)
    plt.contourf(xx1, xx2, Z, alpha = 0.3, cmap = cmap)
    plt.xlim(xx1.min(), xx1.max())
    plt.ylim(xx2.min(), xx2.max())
    
    for idx, cl in enumerate(np.unique(y)):
        plt.scatter(X[y == cl, 0], X[y == cl, 1],
                   alpha = 0.8, c = colors[idx],    # alpha : size of marker
                   marker = markers[idx], label = cl,
                   edgecolor = 'black')
        
    if test_idx:
        X_test, y_test = X[test_idx, :], y[test_idx]
        
        plt.scatter(X_test[:, 0], X_test[:, 1],
                   c = '', edgecolor = 'black', alpha = 1,
                   s = 100, label = 'test set')

 

Learn modeling and identify decision regions

svm = SVC(kernel = 'rbf', random_state = 1, gamma = 0.1, C = 10)
svm.fit(X_xor, y_xor)
plot_decision_regions(X_xor, y_xor, classifier = svm)
plt.legend(loc = 'upper left')
plt.tight_layout()
plt.show()

We can confirm the formation of decision regions nonlinearly.



Linearly separable data can also solve the problem by changing the decision regions to resemble the linear ones.

Iris classification by using linear kernel
Iris classification by using rbf kernel

 

728x90
반응형