标签:data 合并 版本 正则 contains 因此 效果 english ram
1. mydf.dropna(subset=[‘col1‘, ‘col2‘], inplace = True)
2. mydf = pd.DataFrame({
‘name‘ : [‘Tom‘,‘Amy‘,‘John‘,‘George‘],
‘sex‘ : [‘male‘,‘female‘,np.nan,‘male‘],
‘number‘ : [‘SA1001‘,‘SA1002‘,‘SA1003‘,‘SA1004‘],
‘grade‘ : [11, 22, 33, 44]
})
mydf[‘sex‘].isnull().value_counts()
mydf[‘sex‘] = mydf[‘sex‘].fillna(‘999‘)
pos = mydf[mydf.sex==‘999‘].index.tolist()
mydf = mydf.drop(pos) # mydf.drop(pos, inplace = True)
df[df.isnull().T.any()]`
非转置:frame3.isnull().any()
,得到的每一列求any()计算的结果,输出为列的Series。
1. del mydf[‘colume_name‘] 直接修改原来的数据
2. mydf.drop(‘colume_name‘, axie = 1, inplace = True)
1. col = [‘column1‘, ‘column2‘] # 指定多个列名
mydf.drop(labels = col, axis = 1, inplace = True)
2. mydf.drop(mydef.columns[[2,4]], axie = 1, inplace = True) # 指定多个列索引
pd.merge(df_left, df_right, how = ‘left/right‘, on = ‘column_name‘)`:函数实现列的合并,on指向的列名必须在两个df表中都存在。
df1 = pd.DataFrame({‘name‘:[‘Tom‘,‘Amy‘,‘John‘,‘George‘],
‘sex‘:[‘male‘,‘female‘,np.nan,‘male‘],
‘number‘:[‘SA1001‘,‘SA1002‘,‘SA1003‘,‘SA1004‘]})
df2 = pd.DataFrame({‘name‘:[‘Tom‘,‘Amy‘,‘John‘,‘George‘],
‘age‘:[18,22,25,20],
‘grade‘:[77, 88, 99, 86]})
df3 = pd.DataFrame({‘name‘:[‘Tom‘,‘Amy‘,‘Jack‘,‘George‘],
‘age‘:[18,22,25,20],
‘grade‘:[77, 88, 90, 86]})
print(pd.merge(df1, df2, how=‘left‘, on=‘name‘))
# Jack 那一行被遗弃,而df1的John在df3不存在因此age和grade是NaN
print(pd.merge(df1, df3, how=‘left‘, on=‘name‘))
# John 那一行被遗弃,而df1的Jack在df3不存在因此sex和number是NaN
print(pd.merge(df1, df3, how=‘right‘, on=‘name‘))
pd.concat([df1, df2], axis = 1),按列拼接则需要行数一致。
pd.concat([df1, df2], axis = 0, ignore_index = True), 按行拼接会使得不重名的列为NaN,ignore_index设为True使得新添df2元素的序号沿着df1本来的末尾序号递增.
df1.append(df2, ignore_index = True),横向添加新行
如果添加的列名不在dataframe对象中,将会被当作新的列进行添加。
因此和 pd.concat([df1, df2],axis = 0)
效果一样。
1. df.values --> 可以存在字符串
2. df.as_matrix() --> 高版本已过时
3. np.array(df)
1. df[[‘col1‘, ‘col2‘, ‘col3‘]].min()、 df[[‘col1‘, ‘col2‘, ‘col3‘]].max()、 df[[‘col1‘, ‘col2‘, ‘col3‘]].mean()
2. numpy.min(df[[‘col1‘, ‘col2‘, ‘col3‘]])
max_min_scaler = lambda x : (x-np.min(x))/(np.max(x)-np.min(x))
print(df.iloc[:,2:].apply(max_min_scaler))
print(df[[‘chinese_grade‘,‘math_grade‘,‘english_grade‘]].apply(max_min_scaler)) #方法2
df = pd.DataFrame({‘name‘:[‘Tom‘,‘Amy‘,‘John‘,‘George‘],
‘age‘:[18,22,25,20],
‘chinese_grade‘:[77, 88, 99, 86],
‘math_grade‘:[75,98,88,66],
‘english_grade‘:[67,70,59,78]})
df = df[[‘age‘, ‘name‘, ‘chinese_grade‘, ‘math_grade‘, ‘english_grade‘]]
df.index = range(4,0,-1)
df.index = [4,3,2,1]
df.index = [‘a‘, ‘b‘, ‘c‘, ‘d‘] # 可设为字符串
mask = [False, True, False, True]
df_mask = df[mask]
print(df_mask)
print(df[‘name‘] == ‘Amy‘)
print(df[df[‘name‘] == ‘Amy‘])
case=True不区分大小写,na=True遇到NaN也认为搜到了目标
print(df[‘name‘].str.contains(‘o‘, case=False, na=False))
print(df[df[‘name‘].str.contains(‘o‘)])
df[‘name‘].str.startswich(‘A‘)
df[‘name‘].str.endswich(‘e‘)
df[‘name‘].str.match(pattern)
标签:data 合并 版本 正则 contains 因此 效果 english ram
原文地址:https://www.cnblogs.com/Higgerw/p/14087574.html