码迷,mamicode.com
首页 > 其他好文 > 详细

统计学习笔记(4) 线性回归(1)

时间:2016-04-05 10:54:55      阅读:342      评论:0      收藏:0      [点我收藏+]

标签:

Quantitative algorithm

Error evaluation

In this chapter, we review some of the key ideas underlying the linear regression model, as well as the least squares approach that is most commonly used to fit this model.

Basic form:

技术分享

“≈” means “is approximately modeled as”, to estimate the parameters, by far the most common approach involves minimizing the least squares criterion. Let training samples be (x1,y1),...,(xn,yn), then define the residual sum of squares (RSS) as

技术分享

And we get

技术分享

Population regression line: the best fitting result

Least squared regression line: with limited samples

To evaluate how well our estimation meets the true values, we use standard error, e.g. to estimate the mean value:

技术分享

where σ is the standard deviation of each of the sample yi. n is the number of samples.

In this formulation:

                                                                            技术分享
                                                       技术分享,技术分享
The estimate of σ is residual standard error. We use residual standard error (RSE)

技术分享

The 95% confidence interval for β1 and β0:

技术分享  技术分享

As an example, we can use

技术分享

to get the probability that β1 is zero, or the non-existence of the relationship between X(predictor) and Y(response).

Assessing the Accuracy of the Model
There are 3 criterions:

1. Residual Standard Error

技术分享      技术分享

2. R^2 Statistic

技术分享    技术分享


TSS measures the total variance in the response Y, and can be thought of as the amount of variability inherent in the response before the regression is performed. In contrast, RSS measures the amount of variability that is left unexplained after performing the regression. TSS ? RSS measures the amount of variability in the response that is explained (or removed) by performing the regression. An R^2 statistic that is close to 1 indicates that a large proportion of the variability in the response has been explained by the regression. On the other hand, we can use

技术分享

to assess the fit of the linear model. In simple linear model, R^2=Cor(X,Y)^2.

Multiple Linear Regression

技术分享

we can use least squares to get

技术分享

we also use F-statistic to

技术分享

An explanation of the above expression is

                                                         技术分享                     技术分享

The RSS represents the variability left unexplained, TSS is the total variability, as we have already estimated p variables and there are n variables as a whole, so the variance of RSS is n-p-1, TSS-RSS is p.

Subtopic1: whether each of the predictors is useful in predicting the response.

To indicate which one of

                                                       技术分享            技术分享

is true. If F-statistic is close to 1, then H0 is true. If F-statistic is greater than 1, then Ha is true.

It turns out that the answer depends on the values of n and p. When n is large, an F-statistic that is just a little larger than 1 might still provide evidence against H0. In contrast, a larger F-statistic is needed to reject H0 if n is small. When H0 is true and the errors i have a normal distribution, the F-statistic follows an F-distribution. For any given value of n and p, any statistical software package can be used to compute the p-value associated with the F-statistic using this distribution. Based on this p-value, we can determine whether or not to reject H0.

Here the p-value is defined as the probability, under the assumption of hypothesis H, of obtaining a result equal to or more extreme than what was actually observed. Here the reason that a smaller p can indicate the existence of the relationship between at least one of p parameters and the result is that only When H0 is true and the errors i have a normal distribution, the F-statistic follows an F-distribution. And when p is small, the hypothesis under consideration may not adequately explain the observation.

                                                                                                         技术分享

If we only want to estimate a proportion of the parameters, for example. q parameters,

技术分享

Suppose that the residual sum of squares for that model is RSS0,

                                                                                            技术分享
for each individual predictor a t-statistic and a p-value were reported. It turns out that each of these are exactly equivalent to the F-test that omits that single variable from the model. So it reports the partial effect of adding that variable to the model. q=1.
But we cannot only estimate the p-value for each predictor. We should also estimate the overall F-statistic. for even if H0 is true, there is only a 5 % chance that the Fstatistic will result in a p-value below 0.05, regardless of the number of predictors or the number of observations.
Subtopic2: Importance of variables
To make sure of the importance of the predictors, we can try:

Method1: We begin with the null model, then fit p simple linear regressions and add to the null model the variable that results in the lowest RSS.

Method2: We start with all variables in the model, and backward remove the variable with the largest p-value.

Method3: We start with no variables in the model, we add the variable that provides the best fit, if at any point the p-value for one of the variables in the model rises above a certain threshold, then we remove that variable from the model. We continue to perform these forward and backward steps until all variables in the model have a sufficiently low p-value, and all variables outside the model would have a large p-value if added to the model.
Backward selection cannot be used if p > n, while forward selection can always be used. Forward selection is a greedy approach, and might include variables early that later become redundant. Mixed selection can remedy this.

Subtopic3: Model fitting quality

We find the use of predictors by adding one by one and evaluate the RSS and RSE. If the RSE is greatly reduced, then the predictor is useful. But models with more variables can have higher RSE if the decrease in RSS is small relative to the increase in p.
Subtopic4: Given a set of predictor values, what response value should we predict, and how accurate is our prediction?
We use prediction intervals to answer this question. Prediction intervals are always wider than confidence intervals, because they incorporate both the error in the estimate for f(X) (the reducible error) and the uncertainty as to how much an individual point will differ from the population regression plane (the irreducible error).
















统计学习笔记(4) 线性回归(1)

标签:

原文地址:http://blog.csdn.net/jyl1999xxxx/article/details/51034449

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!