Summary of Indexing operation in DataFrame of Pandas

时间：2020-04-02 01:18:13 阅读：84 评论：0 收藏：0 [点我收藏+]

标签：middle can xen ilo when bin nal typeerror sha

Summary of Indexing operation in DataFrame of Pandas

For new users of pandas, the index of DataFrame may seem confusing, so personally I list all its usage in detail and finally make a conclusion about the result of exploration on indexing operation on DataFrame of pandas.

import pandas as pd

import numpy as np

df=pd.DataFrame(np.arange(16).reshape(4,4),index=[‘Ohio‘,‘Colorado‘,‘Utah‘,‘New York‘],columns=[‘one‘,‘two‘,‘three‘,‘four‘]);df

	one	two	three	four
Ohio	0	1	2	3
Colorado	4	5	6	7
Utah	8	9	10	11
New York	12	13	14	15

(1) df[val]

when val is a number,df[val] selects single column from DataFrame,returnning Series type.

df[‘one‘]

Ohio         0
Colorado     4
Utah         8
New York    12
Name: one, dtype: int32

when val is a list,df[val] selects sequence columns from DataFrame,returnning DataFrame type.

df[[‘one‘,‘two‘]]

	one	two
Ohio	0	1
Colorado	4	5
Utah	8	9
New York	12	13

when val is :num, df[val] selects rows, and that is for a convenience purpose.That is equivalent to df.iloc[:num],which is specially used to deal with row selection.

df[:2]

	one	two	three	four
Ohio	0	1	2	3
Colorado	4	5	6	7

df.iloc[:2] # the same with above

	one	two	three	four
Ohio	0	1	2	3
Colorado	4	5	6	7

df[1:3]

	one	two	three	four
Colorado	4	5	6	7
Utah	8	9	10	11

df.iloc[1:3]

	one	two	three	four
Colorado	4	5	6	7
Utah	8	9	10	11

when val is boolean DataFrame, df[val] sets values based on boolean

df<5

	one	two	three	four
Ohio	True	True	True	True
Colorado	True	False	False	False
Utah	False	False	False	False
New York	False	False	False	False

df[df<5]

	one	two	three	four
Ohio	0.0	1.0	2.0	3.0
Colorado	4.0	NaN	NaN	NaN
Utah	NaN	NaN	NaN	NaN
New York	NaN	NaN	NaN	NaN

df[df<5]=0;df

	one	two	three	four
Ohio	0	0	0	0
Colorado	0	5	6	7
Utah	8	9	10	11
New York	12	13	14	15

(2)df.loc[val]

when val is a single index value,selects corresponding row,returnning Series type, and when val is list of index vale, selects corresponding rows,returnning DataFrame type.

df.loc[‘Colorado‘]

one      0
two      5
three    6
four     7
Name: Colorado, dtype: int32

df.loc[[‘Colorado‘,‘New York‘]]

	one	two	three	four
Colorado	0	5	6	7
New York	12	13	14	15

(3)df.loc[:,val]

when val is a single column value,selects corresponding column,returning Series type and when val is list of columns,select corresponding columns,returnning DataFrame type.

df.loc[:,‘two‘]

Ohio         0
Colorado     5
Utah         9
New York    13
Name: two, dtype: int32

df.loc[:,[‘two‘]] # Note that ,as long as val is a list even though containing just one element ,it will return DataFrame type.

	two
Ohio	0
Colorado	5
Utah	9
New York	13

df.loc[:,[‘one‘,‘two‘]]

	one	two
Ohio	0	0
Colorado	0	5
Utah	8	9
New York	12	13

df[[‘one‘,‘two‘]] # The same with above df.loc[:,[‘one‘,‘two‘]]

	one	two
Ohio	0	0
Colorado	0	5
Utah	8	9
New York	12	13

(3)df.loc[val1,val2]

when val1 may be a single index value or list of index values,and val2 may be a single column value or list of column values,selects the combination data decided by both val1 and val2.And specially, val1 or val2 can both be : to participate in the combination.

df.loc[‘Ohio‘,‘one‘]

df.loc[[‘Ohio‘,‘Utah‘],‘one‘]

Ohio    0
Utah    8
Name: one, dtype: int32

df.loc[‘Ohio‘,[‘one‘,‘two‘]]

one    0
two    0
Name: Ohio, dtype: int32

df.loc[[‘Ohio‘,‘Utah‘],[‘one‘,‘two‘]]

	one	two
Ohio	0	0
Utah	8	9

df.loc[:,:]

	one	two	three	four
Ohio	0	0	0	0
Colorado	0	5	6	7
Utah	8	9	10	11
New York	12	13	14	15

df.loc[‘Ohio‘,:]

one      0
two      0
three    0
four     0
Name: Ohio, dtype: int32

df.loc[:,‘two‘]

Ohio         0
Colorado     5
Utah         9
New York    13
Name: two, dtype: int32

df.loc[:,[‘one‘,‘two‘]]

	one	two
Ohio	0	0
Colorado	0	5
Utah	8	9
New York	12	13

(4) df.iloc[val]

Compared with df.loc,val shall be integer or lists of integer which represents the index number and the function is the same with df.loc

df.iloc[1]

one      0
two      5
three    6
four     7
Name: Colorado, dtype: int32

df.iloc[[1,3]]

	one	two	three	four
Colorado	0	5	6	7
New York	12	13	14	15

(5)df.iloc[:,val]

The same with df.loc,except that val shall be integer or list of integers.

df

	one	two	three	four
Ohio	0	0	0	0
Colorado	0	5	6	7
Utah	8	9	10	11
New York	12	13	14	15

df.iloc[:,1]

Ohio         0
Colorado     5
Utah         9
New York    13
Name: two, dtype: int32

df.iloc[:,[1,3]]

	two	four
Ohio	0	0
Colorado	5	7
Utah	9	11
New York	13	15

(6)df.iloc[val1,val2]

The same with df.loc,except val1 and val2 shall be integer or list of integers

df.iloc[1,2]

df.iloc[1,[1,2,3]]

two      5
three    6
four     7
Name: Colorado, dtype: int32

df.iloc[[1,2],2]

Colorado     6
Utah        10
Name: three, dtype: int32

df.iloc[[1,2],[1,2]]

	two	three
Colorado	5	6
Utah	9	10

df.iloc[:,[1,2]]

	two	three
Ohio	0	0
Colorado	5	6
Utah	9	10
New York	13	14

df.iloc[[1,2],:]

	one	two	three	four
Colorado	0	5	6	7
Utah	8	9	10	11

(7)df.at[val1,val2]

val1 shall be a single index value,val2 shall be a single column value.

df.at[‘Utah‘,‘one‘]

df.loc[‘Utah‘,‘one‘] # The same with above

df.at[[‘Utah‘,‘Colorado‘],‘one‘] # Raise exception

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

D:\Anaconda\lib\site-packages\pandas\core\frame.py in _get_value(self, index, col, takeable)
   2538         try:
-> 2539             return engine.get_value(series._values, index)
   2540         except (TypeError, ValueError):


pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()


pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()


pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()


TypeError: ‘[‘Utah‘, ‘Colorado‘]‘ is an invalid key


During handling of the above exception, another exception occurred:


TypeError                                 Traceback (most recent call last)

<ipython-input-77-c52a9db91739> in <module>()
----> 1 df.at[[‘Utah‘,‘Colorado‘],‘one‘]


D:\Anaconda\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
   2140 
   2141         key = self._convert_key(key)
-> 2142         return self.obj._get_value(*key, takeable=self._takeable)
   2143 
   2144     def __setitem__(self, key, value):


D:\Anaconda\lib\site-packages\pandas\core\frame.py in _get_value(self, index, col, takeable)
   2543             # use positional
   2544             col = self.columns.get_loc(col)
-> 2545             index = self.index.get_loc(index)
   2546             return self._get_value(index, col, takeable=True)
   2547     _get_value.__doc__ = get_value.__doc__


D:\Anaconda\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3076                                  ‘backfill or nearest lookups‘)
   3077             try:
-> 3078                 return self._engine.get_loc(key)
   3079             except KeyError:
   3080                 return self._engine.get_loc(self._maybe_cast_indexer(key))


pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()


pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()


TypeError: ‘[‘Utah‘, ‘Colorado‘]‘ is an invalid key

(8) df.iat[val1,val2]

The same with df.at,except val1 and val2 shall be both integer

df.iat[2,2]

df

	one	two	three	four
Ohio	0	0	0	0
Colorado	0	5	6	7
Utah	8	9	10	11
New York	12	13	14	15

Conclusion

val in df[val] can be a column value or list of column values in this case to selecting the whole column,and specially can also be set :val meaning to select corresponding sliced rows.And also can be boolean DataFrame to set values.
Generally speaking, df.loc[val] is mainly used to select rows or the combination of rows and columns,so val has the following forms:single row value,list of row values,val1,val2(val1 and val2 can be single value or list of values or :,and in this form,it selects the combination index value val1 and column value val2
df.iloc[val] is the same with df.loc,except val demands integer,whatever single integer value or lists of integers.
df.at[val1,val2] shall be only single value and this also applies to df.iat[val1,val2]

Summary of Indexing operation in DataFrame of Pandas

标签：middle can xen ilo when bin nal typeerror sha

原文地址：https://www.cnblogs.com/johnyang/p/12617102.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行