引
By “group by” we are referring to a process involving one or more of the following steps Splitting the data into groups based on some criteria Applying a function to each group independently Combining the results into a data structure See the Grouping section
代码
df = pd.DataFrame({‘A‘: [‘foo‘, ‘bar‘, ‘foo‘, ‘bar‘,‘foo‘, ‘bar‘, ‘foo‘, ‘foo‘], ‘B‘: [‘one‘, ‘one‘, ‘two‘, ‘three‘,‘two‘, ‘two‘, ‘one‘, ‘three‘], ‘C‘: np.random.randn(8), ‘D‘: np.random.randn(8)}) print(df) print(df.groupby(‘A‘).sum()) # 计算 foo bar 各自对应 C D 列的和(B列无法求和) print(df.groupby([‘A‘,‘B‘]).sum()) # 同理,不过这里有个一对多的关系 # A B C D # 0 foo one 0.102071 -0.301926 # 1 bar one 1.161158 0.847451 # 2 foo two -0.023879 0.936338 # 3 bar three -0.353075 -0.834349 # 4 foo two -0.272542 -1.425635 # 5 bar two -1.016016 -0.031614 # 6 foo one -0.428517 0.892747 # 7 foo three -0.843796 0.614443 # / # C D # A # bar -0.207932 -0.018512 # foo -1.466663 0.715967 # C D # / # A B # bar one 1.161158 0.847451 # three -0.353075 -0.834349 # two -1.016016 -0.031614 # foo one -0.326445 0.590821 # three -0.843796 0.614443 # two -0.296421 -0.489296