标签:mode out remove ase not arm com rem input
1. remove skew
Why:
Many model built on the hypothsis that the input data are distributed as a ‘Normal Distribution‘(Gaussian Distribution). So if the input data is more like Normal Distribution, the results are better.
Methods:
2. standardization
Why:
Different data have different scale, to avoid give to high weight to those data with large scale.
Methods:
3. manual remove
Why:
sometimes we know that some columns are meanless, so we just remove it manually.
Method:
4. remove columns with too many nulls
Why:
if a feature has too many nulls, it‘s not reliable.
Method:
5. drop outlier
Why:
outliers are the special cases for a set of data. they don‘t represent the common experience. so they will not contribute to a model, on the contrary, they will be harmful for our models.
Methods:
6. to be continued
标签:mode out remove ase not arm com rem input
原文地址:http://www.cnblogs.com/fuxiaotong/p/7573585.html