标签:注意 csdn hat 深度 生成 情况下 images https href
假设有两个类别\(C_{1}、C_{2}\),\(P(C_{1}|x)=\frac{P(x|C_{1})P(C_{1})}{P(x|C_{1})P(C_{1})+P(x|C_{2})P(C_{2})}\)其中\(P(C_{1})、P(C_{2})\)为先验分布,\(P(x|C_{1})、P(x|C_{2})\)都是高斯分布,服从以下分布:
\[f_{\mu ,\Sigma }(x)=\frac{1}{(2\pi )^{D/2}|\Sigma |^{1/2}}exp^{-\frac{1}{2}(x-u)^{T}\Sigma ^{-1}(x-u)} \]
根据已有的标签数据可以求得每一类均值和方差的估计,由于该类样本服从高斯分布,则其似然如下:
\[L(\mu ,\Sigma )=f_{\mu ,\Sigma }(x_{1})f_{\mu ,\Sigma }(x_{2})f_{\mu ,\Sigma }(x_{1})\cdots f_{\mu ,\Sigma }(x_{N}) \]
求解可得:
\[\mu ^{*}=\frac{1}{N}\sum_{1}^{N}x^{n} \]
这里的推导假设\(P(C_{1})、P(C_{2})\)具有同样的方差,且\(\Sigma\)是对角线矩阵(即假设特征之间相互独立),证明如下:
后验概率\(P(C_{1}|x)=\frac{P(x|C_{1})P(C_{1})}{P(x|C_{1})P(C_{1})+P(x|C_{2})P(C_{2})}=\frac{1}{1+\frac{P(x|C_{2})P(C_{2})}{P(x|C_{1})P(C_{1})}}\)
\(\sigma (z)\)即是sigmoid函数,其图像为:
另外\(z=ln\frac{P(x|C_{1})}{P(x|C_{2})}+ln\frac{P(C_{1})}{P(C_{2})}\),其中
\(ln\frac{P(x|C_{1})}{P(x|C_{2})}=ln\frac{\frac{1}{(2\pi )^{D/2}|\Sigma ^{1}|^{1/2}}exp^{-\frac{1}{2}(x-\mu ^{1})^{T}(\Sigma^{1}) ^{-1}(x-\mu ^{1})}}{\frac{1}{(2\pi )^{D/2}|\Sigma ^{2}|^{1/2}}exp^{-\frac{1}{2}(x-\mu ^{2})^{T}(\Sigma^{2}) ^{-1}(x-\mu ^{2})}}\)
\(=ln\frac{|\Sigma ^{1}|^{1/2}}{|\Sigma ^{2}|^{1/2}}exp^{-\frac{1}{2}(x-\mu ^{1})^{T}(\Sigma^{1}) ^{-1}(x-\mu ^{1})+\frac{1}{2}(x-\mu ^{2})^{T}(\Sigma^{2}) ^{-1}(x-\mu ^{2})}\)
\(=ln\frac{|\Sigma ^{1}|^{1/2}}{|\Sigma ^{2}|^{1/2}}-\frac{1}{2}[(x-\mu ^{1})^{T}(\Sigma^{1}) ^{-1}(x-\mu ^{1})-(x-\mu ^{2})^{T}(\Sigma^{2}) ^{-1}(x-\mu ^{2})]\)
其中
同理
因为前面假设\(\Sigma^{1}=\Sigma^{2}=\Sigma\),所以\(ln\frac{|\Sigma ^{1}|^{1/2}}{|\Sigma ^{2}|^{1/2}}=0\),\(x^{T}(\Sigma ^{1})^{-1}x=x^{T}(\Sigma ^{2})^{-1}x\)则
\[z=(\mu^{1})^{T}(\Sigma ^{1})^{-1}x-\frac{1}{2}(\mu ^{1})^{T}(\Sigma ^{1})^{-1}\mu ^{1}-(\mu^{2})^{T}(\Sigma ^{2})^{-1}x+\frac{1}{2}(\mu ^{2})^{T}(\Sigma ^{2})^{-1}\mu ^{2}+ln\frac{P(C^{1})}{P(C^{2})} \]
则\(P(C_{1}|x)=\sigma (wx+b)\)
在使用生成模型时需要估计\(\mu_{1}、\mu_{2}、\Sigma\),现在可以直接估计\(w\)和\(b\)。
\[L (w,b)=f_{w,b}(x^{1})f_{w,b}(x^{2})(1-f_{w,b}(x^{3}))\cdots f_{w,b}(x^{N}) \]
上式中
\[lnL (w,b)\\=-[\hat{y}^{1}lnf(x^{1})+(1-\hat{y}^{1})ln(1-f(x^{1}))]\\-[\hat{y}^{2}lnf(x^{2})+(1-\hat{y}^{2})ln(1-f(x^{2}))]\\-[\hat{y}^{3}lnf(x^{3})+(1-\hat{y}^{3})ln(1-f(x^{3}))]\\\cdots \\=\sum-[\hat{y}^{n}lnf(x^{n})+(1-\hat{y}^{n})ln(1-f(x^{n}))] \]
上式即为交叉熵损失函数:
\[H(p,q)=-\sum_{x}p(x)lnq(x) \]
接下来对参数进行求导:
\[\frac{\partial -Lnf(w,b)}{\partial w_{i}}=\sum-[\hat{y}^{n}\frac{\partial lnf(x^{n})}{\partial w_{i}}+(1-\hat{y}^{n})\frac{\partial ln(1-f(x^{n})))}{\partial w_{i}}] \]
所以\(-\frac{\partial lnL(w,b)}{\partial w_{i}}=\sum-[\hat{y}^{n}(1-f_{w,b}(x^{n}))x_{i}^{n}-(1-\hat{y}^{n})f_{w,b}(x^{n})x_{i}^{n}]=\sum-(\hat{y}^{n}-f_{w,b}(x^{n}))x_{i}^{n}\)
更新梯度
\[L (w,b)=\frac{1}{2}\sum (f_{w,b}(x)-\hat{y}^{n})^{2} \]
接下来对参数进行求导:
\[\frac{\partial (f_{w,b}(x)-\hat{y}^{n})^{2}}{\partial w_{i}}=2(f_{w,b}(x)-\hat{y}^{n})\frac{\partial f_{w,b}(x)}{\partial z}\frac{\partial z}{\partial w_{i}}=2(f_{w,b}(x)-\hat{y}^{n}){\color{Red} {f_{w,b}(x)(1-f_{w,b}(x))}}x_{i} \]
由于上式红色部分的存在会导致当\(f_{w,b}(x)=0\)时\(\frac{\partial L}{\partial w_{i}}=0\),而当\(f_{w,b}(x)=1\)时也有\(\frac{\partial L}{\partial w_{i}}=0\)。
交叉熵与均方误差的图像如下:
判别模型会直接估计\(w,b\)。
生成模型会估计\(\mu_{1}、\mu_{2}、\Sigma\),其中
\[w^{T}=(\mu ^{1}-\mu ^{2})\Sigma ^{-1},b=-\frac{1}{2}(\mu ^{1})^{T}\Sigma ^{-1}\mu ^{1}+\frac{1}{2}(\mu ^{2})^{T}\Sigma ^{-1}\mu ^{2}+ln\frac{P(C_{1})}{P(C_{2})} \]
一般情况下两种\(w和b\)不一定相同。
生成模型的优点:
标签:注意 csdn hat 深度 生成 情况下 images https href
原文地址:https://www.cnblogs.com/CcQun/p/13362483.html