标签:聚集 class from orm 训练 特征 ide 中位数 lan
本文參考:http://scikit-learn.org/stable/data_transforms.html
本篇主要讲数据预处理,包含四部分:
数据清洗、数据降维(PCA类)、数据增维(Kernel类)、提取自己定义特征。
哇哈哈。还是关注预处理比較靠谱。
。。
。
重要的不翻译:scikit-learn providesa library of transformers, which mayclean (see Preprocessing data), reduce (seeUnsupervised dimensionality reduction), expand (see Kernel Approximation) or generate (see Feature extraction) feature representations.
fit、transform、fit_transform三者差别:
fit:从训练集中学习模型的參数(比如,方差、中位数等;也可能是不同的词汇表)
transform:将训练集/測试集中的数据转换为fit学到的參数的维度上(測试集的方差、中位数等;測试集在fit得到的词汇表下的向量值等)。
fit_transform:同一时候进行fit和transform操作。
Like
other estimators, these are represented by classes with fit method,
which learns model parameters (e.g. mean and standard deviation for normalization) from a training set, and a transform method
which applies this transformation model to unseen data. fit_transform may
be more convenient and efficient for modelling and transforming the training data simultaneously.
八大块内容。翻译会在之后慢慢更新:
4.1.
Pipeline and FeatureUnion: combining estimators
4.1.1.
Pipeline: chaining estimators
4.1.2.
FeatureUnion: composite feature spaces
翻译之后的文章,參考:http://blog.csdn.net/mmc2015/article/details/46991465
4.2.3.
Text feature extraction
翻译之后的文章,參考:http://blog.csdn.net/mmc2015/article/details/46997379
4.2.4.
Image feature extraction
翻译之后的文章,參考:http://blog.csdn.net/mmc2015/article/details/46992105
翻译之后的文章。參考:http://blog.csdn.net/mmc2015/article/details/47016313
4.3.1. Standardization, or mean removal and variance scaling
4.3.4. Encoding categorical features
4.3.5. Imputation of missing values
4.4. Unsupervised dimensionality reduction
翻译之后的文章,參考:http://blog.csdn.net/mmc2015/article/details/47066239
4.4.1. PCA: principal component analysis
4.4.3. Feature agglomeration (特征聚集)
翻译之后的文章,參考:http://blog.csdn.net/mmc2015/article/details/47067003
4.5.1. The Johnson-Lindenstrauss lemma
4.5.2. Gaussian random projection
4.5.3. Sparse random projection
翻译之后的文章,參考:http://blog.csdn.net/mmc2015/article/details/47068223
4.6.1. Nystroem Method for Kernel Approximation
4.6.2. Radial Basis Function Kernel
4.6.3. Additive Chi Squared Kernel
4.6.4. Skewed Chi Squared Kernel
4.7. Pairwise metrics, Affinities and Kernels
翻译之后的文章。參考:http://blog.csdn.net/mmc2015/article/details/47068895
4.8. Transforming the prediction target (y)
翻译之后的文章。參考:http://blog.csdn.net/mmc2015/article/details/47069869
scikit-learn:4. 数据集预处理(clean数据、reduce降维、expand增维、generate特征提取)
标签:聚集 class from orm 训练 特征 ide 中位数 lan
原文地址:http://www.cnblogs.com/wzzkaifa/p/7227122.html