标签:splay res man mamicode get following ret series OWIN
# Party’s Frequency of donations nyc.groupby(’Party’)[’contb receipt amt’].count()
The command returns a series where the index is the name of a Party and the value is the count of that Party. Note that the series is ordered by the name of Party alphabetically.
# Party’s Frequency of donations by Date nyc.groupby([’Party’, ’Date’])[’contb receipt amt’].count()
# Party’s Sum of donations nyc.groupby(’Party’)[’contb receipt amt’].sum() # Define the format of float pd.options.display.float format = ’{:,.2f}’.format
nyc.groupby(’Party’)[’contb receipt amt’].sum()
# Top 5 Donors, by Occupation df7 = nyc.groupby(’contbr occupation’)[’contb receipt amt’]. sum(). reset index () df7.sort_values(’contb receipt amt’, ascending=False, inplace =True) df7.head(5) #or df7.nlargest(5,’contb receipt amt’)
# Bottom 5 Donors, by Occupation df8 = nyc.groupby(’contbr occupation’)[’contb receipt amt’]. sum() . reset index () df8 . sort_values (by=’ contb receipt amt ’ , inplace=True) df8.head(5) # OR df7.tail(5) #OR df8.nsmallest(5,’contb receipt amt’)
Get rid of negative values:
df8 [ df8 . contb receipt amt >0].head(5)
The following commands give an example to find the Top 5 occupations that donated to each cadidate. Note that we need to sort the table based on two variables, firtly sorted by candidate name alphabetically and then sorted by contribution amount in a descending order. Finally, we hope to show the Top 5 occupations for each candidate.
# Top 5 Occupations that donated to Each Candidate df10 = nyc.groupby ([ ’cand_nm’ , ’contbr_occupation’ ]) [ ’contb_receipt_amt’ ].sum().reset_index () df10.sort_values ([ ’cand_nm’ , ’contb_receipt_amt’ ] , ascending =[True , False ], inplace=True) df10.groupby(’cand_nm’).head(5)
#Top 5 Fundraising Candidates Line Graph df11 = nyc.groupby(’cand_nm’)[’contb_receipt_amt’].sum(). reset_index () df11_p = df11.nlargest(5,’contb_receipt_amt’) df11_g = nyc[nyc.cand_nm.isin(df11_p.cand_nm)][[ ’cand_nm’,’Date’,’contb_receipt_amt’]] dfpiv=pd.pivot table(df11_g , values=’contb_receipt_amt’, index=[’Date’],columns=[’cand_nm’], aggfunc=np.sum)
dfpiv.loc[‘2016-01-01‘:‘2016?01?30‘].plot.line()
[Python Cookbook] Pandas Groupby
标签:splay res man mamicode get following ret series OWIN
原文地址:https://www.cnblogs.com/sherrydatascience/p/10360750.html