码迷,mamicode.com
首页 > 其他好文 > 详细

Feature Engineering

时间:2017-09-22 12:04:29      阅读:145      评论:0      收藏:0      [点我收藏+]

标签:mode   out   remove   ase   not   arm   com   rem   input   

 

1. remove skew

Why: 

   Many model built on the hypothsis that the input data are distributed as a ‘Normal Distribution‘(Gaussian Distribution). So if the input data is more like Normal Distribution, the results are better.

Methods:

  •     remove skewnewss: log function.

 

2. standardization

Why:

    Different data have different scale, to avoid give to high weight to those data with large scale.

Methods:

  •     min-max = (data - min) / (max - min)
  •     z-score = (data - mean) / (sd), sd standard deviation

 

3. manual remove

Why:

    sometimes we know that some columns are meanless, so we just remove it manually.

Method:

  •     columns like "ID", "timestamp"

 

4. remove columns with too many nulls

Why:

    if a feature has too many nulls, it‘s not reliable.

Method:

  •     count the percentage of nulls.

 

5. drop outlier

Why:

     outliers are the special cases for a set of data. they don‘t represent the common experience. so they will not contribute to a model, on the contrary, they will be harmful for our models.

Methods:

  •     remove data that >= an extreme value, or <= an extreme value.

 

6. to be continued

 

Feature Engineering

标签:mode   out   remove   ase   not   arm   com   rem   input   

原文地址:http://www.cnblogs.com/fuxiaotong/p/7573585.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!