标签:useful infer ide standard ssi eve from 更新 models
\(p(\mathcal{D})\): marginal likelihood/normalizing constant.Do not depend on \(\theta\).数据的边际分布,与参数值/参数的分布无关。
We will contain the following parts:
eg:posterior mean,posterior median
measure confidence in our paremeter estimates.(small sample size or low quality of data may lead to larger uncertainty)
we use \(100(1-\alpha)\%\) credible interval,which contains \(1-\alpha\) of the posterior probability mass.
In this section, we consider a set of (prior, likelihood) pairs for which we can compute the posterior in closed form.
在这一章中我们主要考虑后验分布具有显式解的这一部分。
In particular, we will use priors that are "conjugate" to the likelihood.
We say that a prior \(p(\boldsymbol{\theta}) \in \mathcal{F}\) is a conjugate prior for a likelihood function \(p(\mathcal{D} \mid \boldsymbol{\theta})\) if the posterior is in the same parameterized family as the prior, i.e., \(p(\boldsymbol{\theta} \mid \mathcal{D}) \in \mathcal{F}\).
“共轭”是指先验分布和后验分布均属于同一分布族。
In other words, \(\mathcal{F}\) is closed under Bayesian updating. If the family \(\mathcal{F}\) corresponds to the exponential family , then the computations can be performed in closed form.
分布族关于Bayesian更新是封闭的。
Let \(Y \sim \operatorname{Cat}(\boldsymbol{\theta})\) be a discrete random variable drawn from a categorical distribution. The likelihood has the form
where \(N_{c}=\sum_{n} \mathbb{I}\left(y_{n}=c\right)\):#face c appears
eg:有6面的骰子(C=6,N组数据),\(N_c\)代表在N个数据中第c面出现的个数
The conjugate prior for a categorical distribution is the Dirichlet distribution, which is a multivariate generalization of the beta distribution. This has support over the probability simplex, defined by
The pdf of the Dirichlet is defined as follows:
where \(B(\vec{\alpha})\) is the multivariate beta function,
\(\breve{\boldsymbol{\alpha}}\):是超参数,是给定的,用来推断参数的后验分布
狄利克雷分布是beta分布的推广,用来模拟(多元?)概率密度的分布,beta是1个概率密度的分布。
We can combine the multinomial likelihood and Dirichlet prior to compute the posterior, as follows:
where \(\widehat{\alpha}_{k}=\breve{\alpha}_{k}+N_{k}\) are the parameters of the posterior.
只讨论当\(\mu\)已知时,\(\Sigma\)的分布。
似然函数:
If \(\mu\) is a known constant, the likelihood for \(\sigma^{2}\) has the form
先验分布:
where we can no longer ignore the \(1 /\left(\sigma^{2}\right)\) term in front. The standard conjugate prior is the inverse Gamma distribution , given by 逆Gamma分布(=1/Gamma分布)
后验分布:
i.e.\(IG(\frac{N}{2}+\breve{\boldsymbol{\alpha}},\breve{\boldsymbol{b}}+\frac{1}{2}\sum_{n=1}^N (y_n-\mu)^2)\)
Multiplying the likelihood and the prior, we see that the posterior is also IG:
Generally,we do not have closed form posterior, so we have to used approximate inference method.
All model are wrong , but some are useful——George Box.
we assume we have a set of models \(M\).
objective: we want to choose the best model from some set \(M\).
where
m:model
D:Data
\(p(\mathcal{D} \mid m)\):当选定模型m之后的边际概率密度(相当于\(P(\theta \mid D)=\frac{P(\theta) \cdot P(D \mid \theta)}{P(D)}\)\(中的p(\mathcal{D}\)) )
If the prior over models is uniform, \(p(m)=1 /|\mathcal{M}|,\) then the MAP model is given by
在没有额外信息的情况下,我们认为每种模型的可能性相同
The quantity \(p(\mathcal{D} \mid m)\) is given by
If our goal is to perform prediction, we can get better results if we marginalize out over all models, by computing
D:training data,(x,y):new data,goal:predict y
Disadvantage : computationally very expensive.
week3
[预测分析2021Spring]Chapter 4 Bayesian statistics
标签:useful infer ide standard ssi eve from 更新 models
原文地址:https://www.cnblogs.com/YuzifeiYu/p/14508016.html