Pandas的拼接操作

时间：2019-10-20 19:56:51 阅读：88 评论：0 收藏：0 [点我收藏+]

标签：rand rom 连接 pytho 左右 wine import 右连接 group

pandas的拼接操作

pandas的拼接分为两种：

级联：pd.concat, pd.append
合并：pd.merge, pd.join

import pandas as pd
import numpy as np
from pandas import DataFrame,Series

一. 使用pd.concat()级联

pandas使用pd.concat函数，与np.concatenate函数类似，只是多了一些参数：

objs
axis=0
keys
join='outer' / 'inner':表示的是级联的方式，outer会将所有的项进行级联（忽略匹配和不匹配），而inner只会将匹配的项级联到一起，不匹配的不级联
ignore_index=False

1)匹配级联

行列索引均一致

df1 = DataFrame(data=np.random.randint(0,100,size=(3,4)))
df1

	0	1	2	3
0	61	89	68	51
1	46	79	1	55
2	52	4	72	18

df2 = DataFrame(data=np.random.randint(0,100,size=(3,4)))
df2

	0	1	2	3
0	15	62	20	78
1	60	79	70	58
2	71	87	20	95

pd.concat((df1,df2),axis=0)  # axis=0表示Y轴级联

	0	1	2	3
0	61	89	68	51
1	46	79	1	55
2	52	4	72	18
0	15	62	20	78
1	60	79	70	58
2	71	87	20	95

2) 不匹配级联

不匹配指的是级联的维度的索引不一致。例如纵向级联时列索引不一致，横向级联时行索引不一致

有2种连接方式：

外连接：补NaN（默认模式）
内连接：只连接匹配的项

df1 = DataFrame(data=np.random.randint(0,100,size=(3,4)))
df2 = DataFrame(data=np.random.randint(0,100,size=(3,3)))

pd.concat((df1,df2),axis=0)

	0	1	2	3
0	55	61	54	56.0
1	10	14	6	62.0
2	39	27	99	81.0
0	31	49	80	NaN
1	73	42	44	NaN
2	67	68	97	NaN

pd.concat((df1,df2),axis=0,join='inner')  # inner内连接,只级联匹配的项

	0	1	2
0	55	61	54
1	10	14	6
2	39	27	99
0	31	49	80
1	73	42	44
2	67	68	97

二. 使用pd.merge()合并

merge与concat的区别在于，merge需要依据某一共同的列来进行合并

使用pd.merge()合并时，会自动根据两者相同column名称的那一列，作为key来进行合并。

注意每一列元素的顺序不要求一致

参数：

how：outer取并集(外连接) inner取交集(内连接)
on：当有多列相同的时候，可以使用on来指定使用那一列进行合并，on的值为一个列表

1) 一对一合并

df1 = DataFrame({'employee':['Bob','Jake','Lisa'],
                'group':['Accounting','Engineering','Engineering'],
                })
df1

	employee	group
0	Bob	Accounting
1	Jake	Engineering
2	Lisa	Engineering

df2 = DataFrame({'employee':['Lisa','Bob','Jake'],
                'hire_date':[2004,2008,2012],
                })
df2

	employee	hire_date
0	Lisa	2004
1	Bob	2008
2	Jake	2012

pd.merge(df1, df2)  # 按照employee进行了合并

	employee	group	hire_date
0	Bob	Accounting	2008
1	Jake	Engineering	2012
2	Lisa	Engineering	2004

2) 多对一合并

df3 = DataFrame({
    'employee':['Lisa','Jake'],
    'group':['Accounting','Engineering'],
    'hire_date':[2004,2016]})
df3

	employee	group	hire_date
0	Lisa	Accounting	2004
1	Jake	Engineering	2016

df4 = DataFrame({'group':['Accounting','Engineering','Engineering'],
                       'supervisor':['Carly','Guido','Steve']
                })
df4

	group	supervisor
0	Accounting	Carly
1	Engineering	Guido
2	Engineering	Steve

pd.merge(df3, df4)

	employee	group	hire_date	supervisor
0	Lisa	Accounting	2004	Carly
1	Jake	Engineering	2016	Guido
2	Jake	Engineering	2016	Steve

3) 多对多合并

df1 = DataFrame({'employee':['Bob','Jake','Lisa'],
                 'group':['Accounting','Engineering','Engineering']})
df1

	employee	group
0	Bob	Accounting
1	Jake	Engineering
2	Lisa	Engineering

df2 = DataFrame({'group':['Engineering','Engineering','HR'],
                'supervisor':['Carly','Guido','Steve']
                })
df2

	group	supervisor
0	Engineering	Carly
1	Engineering	Guido
2	HR	Steve

pd.merge(df1,df2,how='right')  # right表示右连接

	employee	group	supervisor
0	Jake	Engineering	Carly
1	Lisa	Engineering	Carly
2	Jake	Engineering	Guido
3	Lisa	Engineering	Guido
4	NaN	HR	Steve

4) key的规范化

当列冲突时，即有多个列名称相同时，需要使用on=来指定哪一个列作为key，配合suffixes指定冲突列名

df1 = DataFrame({'employee':['Jack',"Summer","Steve"],
                 'group':['Accounting','Finance','Marketing']})
df1

	employee	group
0	Jack	Accounting
1	Summer	Finance
2	Steve	Marketing

df2 = DataFrame({'employee':['Jack','Bob',"Jake"],
                 'hire_date':[2003,2009,2012],
                'group':['Accounting','sell','ceo']})
df2

	employee	group	hire_date
0	Jack	Accounting	2003
1	Bob	sell	2009
2	Jake	ceo	2012

pd.merge(df1,df2,on='employee')  # 默认按照employee和group进行合并,可以指定列名

	employee	group_x	group_y	hire_date
0	Jack	Accounting	Accounting	2003

当两张表没有可进行连接的列时，可使用left_on和right_on手动指定merge中左右两边的哪一列列作为连接的列

df1 = DataFrame({'employee':['Bobs','Linda','Bill'],
                'group':['Accounting','Product','Marketing'],
               'hire_date':[1998,2017,2018]})
df1

	employee	group	hire_date
0	Bobs	Accounting	1998
1	Linda	Product	2017
2	Bill	Marketing	2018

df2 = DataFrame({'name':['Lisa','Bobs','Bill'],
                'hire_dates':[1998,2016,2007]})
df2

	hire_dates	name
0	1998	Lisa
1	2016	Bobs
2	2007	Bill

pd.merge(df1,df2,left_on='employee',right_on='name',how='outer')

	employee	group	hire_date	hire_dates	name
0	Bobs	Accounting	1998.0	2016.0	Bobs
1	Linda	Product	2017.0	NaN	NaN
2	Bill	Marketing	2018.0	2007.0	Bill
3	NaN	NaN	NaN	1998.0	Lisa

5) 内合并与外合并:out取并集 inner取交集

内合并：只保留两者都有的key（默认模式）

df6 = DataFrame({'name':['Peter','Paul','Mary'],
               'food':['fish','beans','bread']}
               )
df6

	food	name
0	fish	Peter
1	beans	Paul
2	bread	Mary

df7 = DataFrame({'name':['Mary','Joseph'],
                'drink':['wine','beer']})
df7

	drink	name
0	wine	Mary
1	beer	Joseph

pd.merge(df6, df7)

	food	name	drink
0	bread	Mary	wine

外合并 how=‘outer‘：补NaN

pd.merge(df6, df7, how='outer')

	food	name	drink
0	fish	Peter	NaN
1	beans	Paul	NaN
2	bread	Mary	wine
3	NaN	Joseph	beer

Pandas的拼接操作

标签：rand rom 连接 pytho 左右 wine import 右连接 group

原文地址：https://www.cnblogs.com/zyyhxbs/p/11708522.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行