标签:targe 生成 去重 concat python _id pre evel set
python中使用了pandas的一些操作,特此记录下来:
import pandas as pd
data = pd.DataFrame({
'v_id': ["v_1", 'v_2'],
'label': ["a,b", 'e,f,g'],
})
print(data)
得到结果为:
label v_id
0 a,b v_1
1 e,f,g v_2
import pandas as pd
data = pd.DataFrame({
'v_id': ["v_1", 'v_2'],
'label': ["a,b", 'e,f,g'],
})
df = data.drop('label', axis=1).join(data['label'].str.split(',', expand=True).stack().reset_index(level=1, drop=True).rename('label'))
print(df)
得到结果为:
v_id label
0 v_1 a
0 v_1 b
1 v_2 e
1 v_2 f
1 v_2 g
import pandas as pd
data = pd.DataFrame({
'v_id': ["v_1", 'v_2'],
'label': ["a,b", 'e,f,g'],
})
df = data.drop('label', axis=1).join(data['label'].str.split(',', expand=True).stack().reset_index(level=1, drop=True).rename('label'))
target_label = df.loc[df['label'].isin(["e", "f"])]
print(target_label)
得到结果为:
v_id label
1 v_2 e
1 v_2 f
import pandas as pd
data = pd.DataFrame({
'v_id': ["v_1", 'v_2'],
'label': ["a,b", 'e,f,g'],
})
df = data.drop('label', axis=1).join(data['label'].str.split(',', expand=True).stack().reset_index(level=1, drop=True).rename('label'))
other_label = df[~df['label'].isin(["f", "g"])]
print(other_label)
得到结果为:
v_id label
0 v_1 a
0 v_1 b
1 v_2 e
import pandas as pd
data = pd.DataFrame({
'v_id': ["v_1", 'v_2'],
'label': ["a,b", 'e,f,g'],
})
df = data.drop('label', axis=1).join(data['label'].str.split(',', expand=True).stack().reset_index(level=1, drop=True).rename('label'))
# 没有copy会出现错误:A value is trying to be set on a copy of a slice from a DataFrame
df = df.copy()
df.loc[df["label"] != "", 'label'] = "1"
print(df)
得到结果为:
v_id label
0 v_1 1
0 v_1 1
1 v_2 1
1 v_2 1
1 v_2 1
import pandas as pd
data = pd.DataFrame({
'v_id': ["v_1", 'v_2'],
'label': ["a,b", 'e,f,g'],
})
df = data.drop('label', axis=1).join(data['label'].str.split(',', expand=True).stack().reset_index(level=1, drop=True).rename('label'))
print(df["label"].values.tolist())
得到结果为:
['a', 'b', 'e', 'f', 'g']
import pandas as pd
data = pd.DataFrame({
'v_id': ["v_1", 'v_2'],
'label': ["a,b", 'e,f,g'],
})
df = data.drop('label', axis=1).join(data['label'].str.split(',', expand=True).stack().reset_index(level=1, drop=True).rename('label'))
print(df.drop_duplicates(subset=['v_id']))
得到结果为:
v_id label
0 v_1 a
1 v_2 e
import pandas as pd
data = pd.DataFrame({
'v_id': ["v_1", 'v_2'],
'label': ["a,b", 'e,f,g'],
})
df = data.drop('label', axis=1).join(data['label'].str.split(',', expand=True).stack().reset_index(level=1, drop=True).rename('label'))
df_copy = df.copy()
times = 2
for i in range(times):
df_copy = pd.concat([df_copy,df])
print(df_copy)
得到结果为:
v_id label
0 v_1 a
0 v_1 b
1 v_2 e
1 v_2 f
1 v_2 g
0 v_1 a
0 v_1 b
1 v_2 e
1 v_2 f
1 v_2 g
0 v_1 a
0 v_1 b
1 v_2 e
1 v_2 f
1 v_2 g
标签:targe 生成 去重 concat python _id pre evel set
原文地址:https://www.cnblogs.com/TTyb/p/9717537.html