[ML L3] SVM Intro

时间：2020-06-26 10:39:26 阅读：50 评论：0 收藏：0 [点我收藏+]

标签：rem core line clear width nat problem strong eve

A support vector machine (SVM) is a supervised machine learning model that uses classification algorithms for two-group classification problems. After giving an SVM model sets of labeled training data for each category, they’re able to categorize new text.

So you’re working on a text classification problem. You’re refining your training data, and maybe you’ve even tried stuff out using Naive Bayes. But now you’re feeling confident in your dataset, and want to take it one step further. Enter Support Vector Machines (SVM): a fast and dependable classification algorithm that performs very well with a limited amount of data.

How it works?

The basics of Support Vector Machines and how it works are best understood with a simple example. Let’s imagine we have two tags: red and blue, and our data has two features: x and y. We want a classifier that, given a pair of (x,y) coordinates, outputs if it’s either red or blue. We plot our already labeled training data on a plane:

A support vector machine takes these data points and outputs the hyperplane (which in two dimensions it’s simply a line) that best separates the tags. This line is the decision boundary: anything that falls to one side of it we will classify as blue, and anything that falls to the other as red.

技术图片

But, what exactly is the best hyperplane? For SVM, it’s the one that maximizes the margins from both tags. In other words: the hyperplane (remember it’s a line in this case) whose distance to the nearest element of each tag is the largest.

技术图片

None linear data?

Now this example was easy, since clearly the data was linearly separable — we could draw a straight line to separate red and blue. Sadly, usually things aren’t that simple. Take a look at this case:

技术图片

We can introduce a new linear input param:

z = x^2 + y^2

技术图片

That’s great! Note that since we are in three dimensions now, the hyperplane is a plane parallel to the x axis at a certain z (let’s say z = 1).

What’s left is mapping it back to two dimensions:

技术图片

In other words, we convert a none linear dataset by introduct a new dimensions

[Ref]: https://monkeylearn.com/blog/introduction-to-support-vector-machines-svm/#:~:text=A%20support%20vector%20machine%20(SVM,on%20a%20text%20classification%20problem.

from sklearn.svm import SVC
clf = SVC(gamma=‘auto‘, kernel="rbf", C=10000.0)
clf.fit(features_train, labels_train)
accuracy = clf.score(features_test, labels_test)

## 1% data
##kernal="linear" accuracy=0.88
##kernal="rbf" accuracy=0.61
##kernal="rbf" C=10.0 accuracy=0.61
##kernal="rbf" C=100.0 accuracy=0.61
##kernal="rbf" C=1000.0 accuracy=0.82
##kernal="rbf" C=10000.0 accuracy=0.89

## 35% data
##kernal="rbf" C=10000.0 accuracy=0.96

## 50% data
##kernal="rbf" C=10000.0 accuracy=0.987

## 100% data
##kernal="rbf" C=10000.0 accuracy>0.99

[ML L3] SVM Intro

标签：rem core line clear width nat problem strong eve

原文地址：https://www.cnblogs.com/Answer1215/p/13193715.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行