标签:
Chapter 1-1 : Introduction
Christopher M. Bishop, PRML
Here the model is represented by parameters , for unseen input , to make a prediction
Error function
Finding the solution by differentiation:
Note : Matrix differentiation, , and .
We get
The optimal parameter .
Given a training data set comprising N observations o x, written , together with corresponding observations of the values of t, denoted .
Method:
i.e., function value y(x) (e.g., ) + Gaussian noise.
The input data set x in Figure 1.2 was generated by choosing values of , for , spaced uniformly in range [0, 1], and the target data set t was obtained by first computing the corresponding values of the function sin(2πx) and then adding a small level of random noise having a Gaussian distribution to each such point in order to obtain the corresponding value .
Discussion:
By generating data in this way, we are capturing a property of many real data sets, namely that they possess an underlying regularity, which we wish to learn, but that individual observations are corrupted by random noise. This noise might arise from intrinsically stochastic (i.e. random) processes such as radioactive decay but more typically is due to there being sources of variability that are themselves unobserved.
where is the order of the polynomial, and denotes raised to the power of . The polynomial coefficients are collectively denoted by the vector .
Question: Why called linear model or linear prediction? Why “linear”?
Answer: Note that, although the polynomial function is a nonlinear function of , it does be a linear function of the coefficients . Functions, such as the polynomial, which are linear in the unknown parameters have important properties and are called linear models and will be discussed extensively in Chapters 3 and 4.
where the factor of is included for later convenience. We can solve the quadratic function of the coefficients , and find a unique optimal solution in closed form, demoted by .
We shall see that the least squares approach (i.e., linear regression) to finding the model parameters represents a specific case of maximum likelihood (discussed in Section 1.2.5), and that the over-fitting problem can be understood as a general property of maximum likelihood. By adopting a Bayesian approach, the over-fitting problem can be avoided. We shall see that there is no difficulty from a Bayesian perspective in employing models for which the number of parameters greatly exceeds the number of data points. Indeed, in a Bayesian model the effective number of parameters adapts automatically to the size of the data set.
The modified error function includes two terms:
CCJ PRML Study Note - Chapter 1-1 : Introduction
标签:
原文地址:http://www.cnblogs.com/glory-of-family/p/5602324.html