标签:ack 删除 默认 int 其他 初始 taf 数组 set
drop_duplicates
去重df = pd.DataFrame({‘stu_name‘: [‘Tom‘, ‘Tony‘, ‘Jack‘, ‘Jack‘, np.nan], ‘stu_age‘: [16, 16, 15, np.nan, 21]})
stu_name stu_age
0 Nancy 17.0
1 Tony 16.0
2 Tony 16.0
3 Jack 21.0
4 Jack NaN
df_clean = df.drop_duplicates(subset=[‘stu_name‘])
print(df)
结果为:
stu_name stu_age
0 Nancy 17.0
1 Tony 16.0
3 Jack 21.0
df_clean2 = df.drop_duplicates(subset=[‘stu_name‘, ‘stu_age‘])
print(df_clean2)
结果为:
stu_name stu_age
0 Nancy 17.0
1 Tony 16.0
3 Jack 21.0
4 Jack NaN
duplicated
配合 drop
去重df = pd.DataFrame({‘stu_name‘: [‘Tom‘, ‘Tony‘, ‘Jack‘, ‘Jack‘, np.nan], ‘stu_age‘: [16, 16, 15, np.nan, 21]})
stu_name stu_age
0 Nancy 17.0
1 Tony 16.0
2 Tony 16.0
3 Jack 21.0
4 Jack NaN
duplicate_df = df[df.duplicated(‘stu_name‘)]
clean_df = df.drop(duplicate_df.index)
duplicated
先筛选出重复的行drop
删除掉重复行drop_duplicates
与 duplicated
常用参数含义subset
: 单个列名或者 一组列名数组(可选)。如果不设置该参数,则默认对全部列进行去重
keep
: 保留的列
标签:ack 删除 默认 int 其他 初始 taf 数组 set
原文地址:https://www.cnblogs.com/convict/p/14855162.html