DataFrame的创建
>>> import pandas as pd >>> from pandas import DataFrame #define a dict >>> dic = {‘Name‘:[‘Jeff‘,‘Lucy‘,‘Evan‘],‘Age‘:[28,26,27],‘Sex‘:[‘Male‘,‘Female‘,‘Male‘]} Load the dict to the dataframe >>> df = DataFrame(dic) >>> print df Age Name Sex 0 28 Jeff Male 1 26 Lucy Female 2 27 Evan Male #the order of the columns is default #We define the order >>> df1 = DataFrame(dic,columns=[‘Name‘,‘Sex‘,‘Age‘]) >>> df1 Name Sex Age 0 Jeff Male 28 1 Lucy Female 26 2 Evan Male 27 #Define an empty column >>> df1 = DataFrame(dic,columns=[‘Name‘,‘Age‘,‘Sex‘,‘Major‘]) >>> df1 Name Age Sex Major 0 Jeff 28 Male NaN 1 Lucy 26 Female NaN 2 Evan 27 Male NaN #Define the row name >>> df1 = DataFrame(dic,columns=[‘Name‘,‘Age‘,‘Sex‘,‘Major‘],index=[‘one‘,‘two‘,‘three‘]) >>> df1 Name Age Sex Major one Jeff 28 Male NaN two Lucy 26 Female NaN three Evan 27 Male NaN
DataFrame内容读取与改变
>>> df1.columns Index([u‘Name‘, u‘Age‘, u‘Sex‘, u‘Major‘], dtype=‘object‘) >>> df1.Sex one Male two Female three Male Name: Sex, dtype: object >>> df1[‘Sex‘] one Male two Female three Male Name: Sex, dtype: object >>> df1.ix[‘two‘] Name Lucy Age 26 Sex Female Major NaN Name: two, dtype: object >>> df1.index Index([u‘one‘, u‘two‘, u‘three‘], dtype=‘object‘) #Copy a colum from a Series >>> df1 Name Age Sex Major one Jeff 28 Male NaN two Lucy 26 Female NaN three Evan 27 Male NaN >>> s1 = ([‘Se‘,‘Se‘,‘Ce‘]) >>> df1.Major=s1 >>> df1 Name Age Sex Major one Jeff 28 Male Se two Lucy 26 Female Se three Evan 27 Male Ce #Define a new column >>> df1[‘Type‘]=df1.Major==‘Se‘ >>> df1 Name Age Sex Major Type one Jeff 28 Male Se True two Lucy 26 Female Se True three Evan 27 Male Ce False #Remove a column >>> del df1[‘Type‘] >>> df1 Name Age Sex Major one Jeff 28 Male Se two Lucy 26 Female Se three Evan 27 Male Ce
Other Methods to define
Define a DF with Two-layer Dict >>> dic1={‘name‘:{‘1‘:‘Jeff‘,‘2‘:‘Mia‘,‘3‘:‘Evan‘},‘age‘:{‘1‘:28,‘3‘:27,‘2‘:18,‘4‘:23}} >>> df2=DataFrame(dic1) >>> df2 age name 1 28 Jeff 2 18 Mia 3 27 Evan 4 23 NaN Transpose >>> df2.T 1 2 3 4 age 28 18 27 23 name Jeff Mia Evan NaN >>> df2.columns.name=‘items‘ >>> df2.index.name=‘student_id‘ >>> df2 items age name student_id 1 28 Jeff 2 18 Mia 3 27 Evan 4 23 NaN >>> df2.values array([[28L, ‘Jeff‘], [18L, ‘Mia‘], [27L, ‘Evan‘], [23L, nan]], dtype=object)