码迷,mamicode.com
首页 > 其他好文 > 详细

机器学习之线性回归

时间:2019-01-13 21:14:38      阅读:170      评论:0      收藏:0      [点我收藏+]

标签:sklearn   near   0.00   agg   training   mod   res   out   imp   

以kaggle上的House Prices: Advanced Regression Techniques为例讲述线性回归

from sklearn.linear_model import LinearRegression, RidgeCV, LassoCV, ElasticNetCV

1、回归模型

(1)一般线性回归:Linear Regression without regularization


from sklearn.linear_model import LinearRegression

lr = LinearRegression()

lr.fit(X_train, y_train)

model = lr

rmse= np.sqrt(-cross_val_score(model, X_train, y_train, scoring = scorer, cv = 10))   #根均方误差

y_train_pred = lr.predict(X_train)

y_test_pred = lr.predict(X_test)

RMSE on Training set : 0.3887932081355326
RMSE on Test set : 0.42716116210823174

(2)岭回归:Linear Regression with Ridge regularization——L2正则化

采用正则化:处理共线性,滤除噪声,防止过拟合;

原理:引入额外的信息偏差,来惩罚极端参数权重;


from sklearn.linear_model import RidgeCV

ridge = RidgeCV(alphas = [0.01, 0.03, 0.06, 0.1, 0.3, 0.6, 1, 3, 6, 10, 30, 60])

ridge.fit(X_train, y_train)

alpha = ridge.alpha_

ridge = RidgeCV(alphas = [alpha * .6, alpha * .65, alpha * .7, alpha * .75, alpha * .8, alpha * .85, 

                          alpha * .9, alpha * .95, alpha, alpha * 1.05, alpha * 1.1, alpha * 1.15,

                          alpha * 1.25, alpha * 1.3, alpha * 1.35, alpha * 1.4], 

                cv = 10)

ridge.fit(X_train, y_train)

alpha = ridge.alpha_

model = ridge

rmse= np.sqrt(-cross_val_score(model, X_train, y_train, scoring = scorer, cv = 10))   #根均方误差

y_train_rdg = ridge.predict(X_train)

y_test_rdg = ridge.predict(X_test)

Ridge RMSE on Training set : 0.11540572328450796
Ridge RMSE on Test set : 0.11642721377799556

(3)使用Lasson正则化的线性回归:Linear Regression with Lasso regularization——L1正则化

LASSO:以最小绝对收缩和选择算子为代表,简单地用权重的绝对值之和替换权重的平方。

与L2正则化相反,L1正则化产生稀疏特征向量,使得大多数特征权重为0。

如果高维数据集具有许多不相关的特征,则稀疏性会很有帮助。


from sklearn.linear_model import LassoCV

lasso = LassoCV(alphas = [0.0001, 0.0003, 0.0006, 0.001, 0.003, 0.006, 0.01, 0.03, 0.06, 0.1, 

                          0.3, 0.6, 1], 

                max_iter = 50000, cv = 10)

lasso.fit(X_train, y_train)

alpha = lasso.alpha_

lasso = LassoCV(alphas = [alpha * .6, alpha * .65, alpha * .7, alpha * .75, alpha * .8, 

                          alpha * .85, alpha * .9, alpha * .95, alpha, alpha * 1.05, 

                          alpha * 1.1, alpha * 1.15, alpha * 1.25, alpha * 1.3, alpha * 1.35, 

                          alpha * 1.4], 

                max_iter = 50000, cv = 10)

lasso.fit(X_train, y_train)

alpha = lasso.alpha_

model = lasso

rmse= np.sqrt(-cross_val_score(model, X_train, y_train, scoring = scorer, cv = 10))   #根均方误差

y_train_las = lasso.predict(X_train)
y_test_las = lasso.predict(X_test)

Lasso RMSE on Training set : 0.11411150837458062
Lasso RMSE on Test set : 0.11583213221750702

(4)使用ElsticNet正则化的线性回归:Linear Regression with ElasticNet regularization——L1和L2正则化

ElasticNet是Ridge和Lasso回归之间的折中,用L1产生稀疏性,用L2克服Lasso的一些限制(例如LASSO不能选择比观察更多的特征)


import sklearn.linear_model import ElasticNetCV

elasticNet = ElasticNetCV(l1_ratio = [0.1, 0.3, 0.5, 0.6, 0.7, 0.8, 0.85, 0.9, 0.95, 1],

                          alphas = [0.0001, 0.0003, 0.0006, 0.001, 0.003, 0.006, 

                                    0.01, 0.03, 0.06, 0.1, 0.3, 0.6, 1, 3, 6], 

                          max_iter = 50000, cv = 10)

elasticNet.fit(X_train, y_train)

alpha = elasticNet.alpha_

ratio = elasticNet.l1_ratio_

elasticNet = ElasticNetCV(l1_ratio = [ratio * .85, ratio * .9, ratio * .95, ratio, ratio * 1.05, ratio * 1.1, ratio * 1.15],

                          alphas = [0.0001, 0.0003, 0.0006, 0.001, 0.003, 0.006, 0.01, 0.03, 0.06, 0.1, 0.3, 0.6, 1, 3, 6], 

                          max_iter = 50000, cv = 10)

elasticNet.fit(X_train, y_train)

if (elasticNet.l1_ratio_ > 1):

    elasticNet.l1_ratio_ = 1    

alpha = elasticNet.alpha_

ratio = elasticNet.l1_ratio_



elasticNet = ElasticNetCV(l1_ratio = ratio,

                          alphas = [alpha * .6, alpha * .65, alpha * .7, alpha * .75, alpha * .8, alpha * .85, alpha * .9, 

                                    alpha * .95, alpha, alpha * 1.05, alpha * 1.1, alpha * 1.15, alpha * 1.25, alpha * 1.3, 

                                    alpha * 1.35, alpha * 1.4], 

                          max_iter = 50000, cv = 10)

elasticNet.fit(X_train, y_train)

if (elasticNet.l1_ratio_ > 1):

    elasticNet.l1_ratio_ = 1    

alpha = elasticNet.alpha_

ratio = elasticNet.l1_ratio_


model = elasticNet

rmse= np.sqrt(-cross_val_score(model, X_train, y_train, scoring = scorer, cv = 10))   #根均方误差

y_train_ela = elasticNet.predict(X_train)

y_test_ela = elasticNet.predict(X_test)

ElasticNet RMSE on Training set : 0.11411150837458062
ElasticNet RMSE on Test set : 0.11583213221750702

此处ratio=1,即完全使用L1正则化,LASSO回归

机器学习之线性回归

标签:sklearn   near   0.00   agg   training   mod   res   out   imp   

原文地址:https://www.cnblogs.com/hugechuanqi/p/10263848.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!