标签:
Quantitative algorithm
Error evaluation
In this chapter, we review some of the key ideas underlying the linear regression model, as well as the least squares approach that is most commonly used to fit this model.
Basic form:
“≈” means “is approximately modeled as”, to estimate the parameters, by far the most common approach involves minimizing the least squares criterion. Let training samples be (x1,y1),...,(xn,yn), then define the
residual sum of squares (RSS) as
And we get
Population regression line: the best fitting result
Least squared regression line: with limited samples
To evaluate how well our estimation meets the true values, we use standard error, e.g. to estimate the mean value:
where σ is the standard deviation of each of the sample yi. n is the number of samples.
In this formulation:
,
The estimate of σ is residual standard error. We use residual standard error (RSE)
The 95% confidence interval for β1 and β0:
As an example, we can use
to get the probability that β1 is zero, or the non-existence of the relationship between X(predictor) and Y(response).
Assessing the Accuracy of the Model
There are 3 criterions:
1. Residual Standard Error
2. R^2 Statistic
TSS measures the total variance in the response Y, and can be thought of as the amount of variability inherent in the response before the regression is performed. In contrast, RSS measures the amount of variability that is left unexplained
after performing the regression. TSS ? RSS measures the amount of variability in the response that is explained (or removed) by performing the regression. An R^2 statistic that is close to 1 indicates that a large proportion of the variability in the response
has been explained by the regression. On the other hand, we can use
to assess the fit of the linear model. In simple linear model,
R^2=Cor(X,Y)^2.
Multiple Linear Regression
we can use least squares to get
we also use F-statistic to
An explanation of the above expression is
The RSS represents the variability left unexplained, TSS is the total variability, as we have already estimated p variables and there are n variables as a whole, so the variance of RSS is n-p-1, TSS-RSS is p.
Subtopic1:
whether each of the predictors is useful in predicting the response.
To indicate which one of
is true. If F-statistic is close to 1, then H0 is true. If F-statistic is greater than 1, then Ha is true.
It turns out that the answer depends on the values of n and p. When n is large, an F-statistic that is just a little larger than 1 might still provide evidence
against H0. In contrast, a larger F-statistic is needed to reject H0 if n is small.
When H0 is true and the errors i have a normal distribution, the F-statistic follows an F-distribution. For any given value of n and p, any statistical software package can be used to compute the p-value associated with the F-statistic using
this distribution. Based on this p-value, we can determine whether or not to reject H0.
Here the p-value is defined as the probability, under the assumption of hypothesis H, of obtaining a result equal to or more extreme than what was actually observed. Here the reason
that a smaller p can indicate the existence of the relationship between at least one of p parameters and the result is that only
When H0 is true and the errors i have a normal distribution, the F-statistic follows an F-distribution. And when p is small, the
hypothesis under consideration may not adequately explain the observation.
If we only want to estimate a proportion of the parameters, for example. q parameters,
Suppose
that the residual sum of squares for that model is RSS0,
for each individual predictor a t-statistic and a p-value were reported. It turns out that each of
these are exactly equivalent to the F-test that omits that single variable from the model. So it reports the partial effect of adding that variable to the model. q=1.
But we cannot only estimate the p-value for each predictor. We should also estimate the overall F-statistic. for even if H0 is true, there is only a 5 % chance that the Fstatistic will result in a p-value below 0.05, regardless of the number of predictors or
the number of observations.
Subtopic2:
Importance of variables
To make sure of the importance of the predictors, we can try:
Method1:
We begin with the null model, then fit p simple linear regressions and add to the null model the variable that results in the lowest RSS.
Method2: We start with all variables in the model, and backward remove the variable with the largest p-value.
Method3: We start with no variables in the model, we add the variable that provides the best fit, if at any point the p-value for one of the variables in the model rises above a certain threshold,
then we remove that variable from the model. We continue to perform these forward and backward steps until all variables in the model have a sufficiently low p-value, and all variables outside the model would have a large p-value if added to the model.
Backward selection cannot be used if p > n, while forward selection can always be used. Forward selection is a greedy approach, and might include variables early that later become redundant. Mixed selection can remedy this.
Subtopic3: Model fitting quality
We find the use of predictors by adding one by one and evaluate the RSS and RSE. If the RSE is greatly reduced, then the predictor is useful. But models with more variables can have higher RSE if the decrease in RSS is small
relative to the increase in p.
Subtopic4: Given a set of predictor values, what response value should we predict, and how accurate is our prediction?
We use
prediction intervals
to answer this question. Prediction
intervals are always wider than confidence intervals, because they
incorporate both the error in the estimate for f(X) (the reducible error) and the uncertainty as
to how much an individual point will differ from the population regression plane (the irreducible error).
标签:
原文地址:http://blog.csdn.net/jyl1999xxxx/article/details/51034449