标签:main tran merge head pandas pca kaggle products spark
数据集来源:https://www.kaggle.com/psparks/instacart-market-basket-analysis
思路:
实例代码:
import pandas as pd from sklearn.decomposition import PCA def main(): ‘‘‘ 降维实例:主成分分析 :return: None ‘‘‘ # 读取数据 prior = pd.read_csv("order_products__prior.csv") products = pd.read_csv("products.csv") orders = pd.read_csv("orders.csv") aisles = pd.read_csv("aisles.csv") # 合并数据 _mg = pd.merge(prior, products, on=[‘product_id‘, ‘product_id‘]) _mg = pd.merge(_mg, orders, on=[‘order_id‘, ‘order_id‘]) mt = pd.merge(_mg, aisles, on=[‘aisle_id‘, ‘aisle_id‘]) # print(mt.head(10)) # 交叉表 cross = pd.crosstab(mt[‘user_id‘], mt[‘aisle‘]) # print(cross) pca = PCA(n_components=0.9) data = pca.fit_transform(cross) print(data) print(data.shape) return None if __name__ == ‘__main__‘: main()
运行结果:
从结果中可以看出数据的维数降到了27
标签:main tran merge head pandas pca kaggle products spark
原文地址:https://www.cnblogs.com/shixinzei/p/10171751.html