pandas高级操作总结

时间：2018-07-15 23:20:49 阅读：287 评论：0 收藏：0 [点我收藏+]

标签：ice for agg last count list nes 最大 ted

1.pandas中的列的分位数

# 查看列的分位数
import pandas as pd
# set columns type
my_df[‘col‘] = my_df[‘col‘].astype(np.float64)

# computations for 4 quantiles : quartiles
bins_col = pd.qcut(my_df[‘col‘], 4)
bins_col_label = pd.qcut(my_df[‘col‘], 4).labels

分位数

2.多重聚合（组函数）

# 多重聚合（组函数）
# columns settings
grouped_on = ‘col_0‘  # [‘col_0‘, ‘col_2‘] for multiple columns
aggregated_column = ‘col_1‘

### Choice of aggregate functions
## On non-NA values in the group
## - numeric choice :: mean, median, sum, std, var, min, max, prod
## - group choice :: first, last, count
# list of functions to compute
agg_funcs = [‘mean‘, ‘max‘]

# compute aggregate values
aggregated_values = my_df.groupby(grouped_on)[aggregated_columns].agg(agg_funcs)

# get the aggregate of group
aggregated_values.ix[group]

多重聚合

3.使用自定义函数进行聚合

# 使用自定义函数进行聚合
# columns settings
grouped_on = [‘col_0‘]
aggregated_columns = [‘col_1‘]

def my_func(my_group_array):
    return my_group_array.min() * my_group_array.count()

## list of functions to compute
agg_funcs = [my_func] # could be many

# compute aggregate values
aggregated_values = my_df.groupby(grouped_on)[aggregated_columns].agg(agg_funcs)

自定义函数进行聚合

4.在聚合的dataframe上使用apply

在聚合中使用apply

# 在聚合的dataframe上使用apply
# top n in aggregate dataframe
def top_n(group_df, col, n=2):
    bests = group_df[col].value_counts()[:n]
    return bests

# columns settings
grouped_on = ‘col_0‘
aggregated_column = ‘col‘

grouped = my_df.groupby(grouped_on)
groups_top_n = grouped.apply(top_n, aggregated_column, n=3)

5.移动平均

# 移动平均
import numpy as np

ret = np.cumsum(np.array(X), dtype=float)
ret[w:] = ret[w:] - ret[:-w]
result = ret[w - 1:] / w

# X: array-like
# window: int

移动平均

6.组数据的基本信息

# 组数据的基本信息
# columns settings
grouped_on = ‘col_0‘  # [‘col_0‘, ‘col_1‘] for multiple columns
aggregated_column = ‘col_1‘

### Choice of aggregate functions
## On non-NA values in the group
## - numeric choice : mean, median, sum, std, var, min, max, prod
## - group choice : first, last, count
## On the group lines
## - size of the group : size
aggregated_values = my_df.groupby(grouped_on)[aggregated_column].mean()
aggregated_values.name = ‘mean‘

# get the aggregate of group
aggregated_values.ix[group]

组数据的基本信息

7.数据组的遍历

数据组的遍历

# 数据组的遍历
# columns settings
grouped_on = ‘col_0‘  # [‘col_0‘, ‘col_1‘] for multiple columns

grouped = my_df.groupby(grouped_on)

i = 0
for group_name, group_dataframe in grouped:
    if i > 10:
        break
    i += 1
    print(i, group_name, group_dataframe.mean())  ## mean on all numerical columns

8.最大互信息数

# 最大互信息数
import numpy as np

matrix = np.transpose(np.array(X)).astype(float)
mine = MINE(alpha=0.6, c=15, est="mic_approx")
mic_result = []
for i in matrix[1:]:
    mine.compute_score(t_matrix[0], i)
    mic_result.append(mine.mic())
return mic_result

最大互信息数

9.pearson相关系数

import numpy as np

matrix = np.transpose(np.array(X))
np.corrcoef(matrix[0], matrix[1])[0, 1]

# X: array-like
# https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.corrcoef.html

pearson相关系数

pandas高级操作总结

标签：ice for agg last count list nes 最大 ted

原文地址：https://www.cnblogs.com/jean925/p/9315291.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行