码迷,mamicode.com
首页 > 系统相关 > 详细

[Machine Learning for Trading] {ud501} Lesson 23: 03-03 Assessing a learning algorithm | Lesson 24: 03-04 Ensemble learners, bagging and boosting

时间:2019-06-05 09:55:41      阅读:138      评论:0      收藏:0      [点我收藏+]

标签:diff   sse   with   bootstrap   amp   cross   compare   may   run   

A closer look at KNN solutions

技术图片

 

 

 What happens as K varies

技术图片

 

 

 

 What happens as D varies

 技术图片

 

 

 

 Metric 1 RMS Error

 技术图片

 

 

 In Sample vs out of sample

 技术图片

 

 

 

 Which is worse?

 技术图片

 

 

 

 Cross validation

 技术图片

5-fold cross validation

 

 

 Roll forward cross validation

 技术图片

技术图片

 

 

 

 Metric 2: correlation

 技术图片

 

 

 

 

 Correlation and RMS error

技术图片

In most cases, as RMS error increases, correlation goes down. But yes, there are some cases where the opposite might happen (e.g. when there is a large bias).

So it might be hard to say for sure.

 

 

 

 

Overfitting 

 技术图片

 

 

 

 Overfitting Quiz

 技术图片

When k = 1, the model fits the training data perfectly, therefore in-sample error is low (ideally, zero). Out-of-sample error can be quite high.

As k increases, the model becomes more generalized, thus out-of-sample error decreases at the cost of slightly increasing in-sample error.

After a certain point, the model becomes too general and starts performing worse on both training and test data.

 

 

 

 

 

 

A Few other considerations 

技术图片

 

 

Ensemble learners 

 技术图片

 

 

 

 

 How to build an ensemble

 技术图片

If we combine several models of different types (here parameterized polynomials and non-parameterized kNN models), we can avoid being biased by one approach.

This typically results in less overfitting, and thus better predictions in the long run, especially on unseen data.

 

 

 

 Bootstrap aggregating bagging

 技术图片

 Correction: In the video (around 02:06), the professor mentions that n’ should be set to about 60% of n, the number of training instances. It is more accurate to say that in most implementations, n’ = n. Because the training data is sampled with replacement, about 60% of the instances in each bag are unique.

 

 

 

 

 Overfitting

 技术图片

Yes, as we saw earlier, a 1NN model (kNN with k = 1) matches the training data exactly, thus overfitting.

An ensemble of such learners trained on slightly different datasets will at least be able to provide some generalization, and typically less out-of-sample error.

 

Correction: As per the question "Which is most likely to overfit?", the correct option to pick is the first one (single 1NN). The second option (ensemble of 10) has been mistakenly marked - but the intent is the same. 

 

 

 

 Bagging example

 技术图片

 

 技术图片

instances that was modelled poorly in the overall system before

 

 

 

Overfitation 

技术图片

As m increases, AdaBoost tries to assign more and more specific data points to subsequent learners, trying to model all the difficult examples.

Thus, compared to simple bagging, it may result in more overfitting.

 

 

 

 Summary

 技术图片

 

[Machine Learning for Trading] {ud501} Lesson 23: 03-03 Assessing a learning algorithm | Lesson 24: 03-04 Ensemble learners, bagging and boosting

标签:diff   sse   with   bootstrap   amp   cross   compare   may   run   

原文地址:https://www.cnblogs.com/ecoflex/p/10977437.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!