机器学习中的两个概率模型

时间：2015-08-31 21:47:15 阅读：158 评论：0 收藏：0 [点我收藏+]

discriminative model 和 generative model是机器学习算法中两种概率模型，用来实现对训练样本的概率分布进行建模，在实践中由于经常混淆，现在通过查阅资料，将两者的分别总结于此。
不妨用stackoverflow上的一段描述来开启这个话题：

Let’s say you have input data x and you want to classify the data into labels y. A generative model learns the joint probability distribution $p(x,y)$ and a discriminative model learns the conditional probability distribution $p(y|x)$ - which you should read as “the probability of $y$ given $x$ ”. Here’s a really simple example. Suppose you have the following data in the form $(x,y): (1,0), (1,0), (2,0), (2, 1)$

$p(x,y)$ is

$p(x,y)$	$y=0$	$y=1$
$x=1$	${1\over2}$	$0$
$x=2$	${1\over4}$	${1\over4}$

$p(y|x)$ is

$p(y\|x)$	$y=0$	$y=1$
$x=1$	${1}$	$0$
$x=2$	${1\over2}$	${1\over2}$

If you take a few minutes to stare at those two matrices, you will understand the difference between the two probability distributions. The distribution $p(y|x)$ is the natural distribution for classifying a given example x into a class y, which is why algorithms that model this directly are called discriminative algorithms. Generative algorithms model $p(x,y)$ , which can be tranformed into $p(y|x)$ by applying Bayes rule and then used for classification. However, the distribution $p(x,y)$ can also be used for other purposes. For example you could use $p(x,y)$ to generate likely $(x,y)$ pairs. From the description above you might be thinking that generative models are more generally useful and therefore better, but it’s not as simple as that. The overall gist is that discriminative models generally outperform generative models in classification tasks.

Generative models are used in machine learning for either modeling data directly (i.e., modeling observations drawn from a probability density function), or as an intermediate step to forming a conditional probability density function. A conditional distribution can be formed from a generative model through Bayes’ rule.

生成模型是对样本数据的联合概率 $p(x,y)$ 进行建模，建模得到的联合概率 $p(x,y)$ 可以用来生成数据对 $(x,y)$ ，所以被称为生成模型。而判别模型则是对条件概率 $p(y|x)$ 进行建模，即给定 $x$ 对应 $y$ 的概率。而通过生成模型是可通过贝叶斯公式推导至判别模型，而从判别模型无法推导至生成模型。 $p(x, y) = p(x | y) p(y)$ ，在进行建模的时候，生成模型在训练样本中将对（以二分类问题为例） $y=0$ 时样本的特征分布和 $y=1$ 时样本的特征分布分别进行建模，然后还需对训练样本中的 $y$ 的先验概率 $p(y)$ 进行建模。当输入新的无标签样本进行测试时，只需通过计算。而判别模型则比较简单，直接通过计算 $p(x, y=1) = p(x | y=1)p(y=1)$ 用来代替 $p(y=1 | x)$ 和 $p(y=0 | x)$ ，并比较两者大小来判定类别归属。而对于判别模型则直接对后验概率模型 $p(y | x)$ 进行建模，比如logistic regression和linear regression等。在测试时，对于无标签样本，直接输入到概率模型中就能得到对应的 $y$ 值，如果是二分类问题，就可以通过输出 $p(y=1 | x)$ 的概率是否大于 $0.5$ 为标准来判定归属。

Although this topic is quite old, I think it’s worth to add this important distinction. In practice the models are used as follows.

In discriminative models to predict the label y from the training example x, you must evaluate:

f (x) = a r g m a x y p (y | x)

$f(x)=arg~max_y~p(y|x)$
Which merely chooses what is the most likely class considering x. It’s like we were trying to model the decision boundary between the classes. This behavior is very clear in neural networks, where the computed weights can be seen as a complex shaped curve isolating the elements of a class in the space.

Now using Bayes’ rule, let’s replace the $p(y|x)$ in the equation by $\frac{p(x|y)p(y)}{p(x)}$ . Since you are just interested in the arg max, you can wipe out the denominator, that will be the same for every y. So you are left with

f (x) = a r g m a x y p (x | y) p (y)

$f(x)=arg~max_y~p(x|y)p(y)$
Which is the equation you use in generative models. While in the first case you had the conditional probability distribution

p(y|x) $p(y|x)$ , which modeled the boundary between classes, in the second you had the joint probability distribution

p(x,y) $p(x, y)$ , since

p(x,y)=p(x|y)p(y) $p(x, y) = p(x | y) p(y)$ , which explicitly models the actual distribution of each class.

With the joint probability distribution function, given an y, you can calculate (“generate”) its respective x. For this reason they are called generative models.

Imagine your task is to classify a speech to a language:

you can do it either by:

1) Learning each language and then classifying it using the knowledge you just gained

2) Determining the difference in the linguistic models without learning the languages and then classifying the speech.

the first one is the Generative Approach and the second one is the Discriminative approach.

Examples of discriminative models used in machine learning include:

Logistic regression
Support vector machines
Boosting (meta-algorithm)
Conditional random fields
Linear regression
Neural networks

Examples of generative models include:

Gaussian mixture model and other types of mixture model
Hidden Markov model
Probabilistic context-free grammar
Naive Bayes
Averaged one-dependence estimators
Latent Dirichlet allocation
Restricted Boltzmann machine

2015-8-31 艺少

机器学习中的两个概率模型

标签：机器学习生成模型判别模型

原文地址：http://blog.csdn.net/lg1259156776/article/details/48139381

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行