码迷,mamicode.com
首页 > 其他好文 > 详细

dplyr的使用

时间:2014-12-07 14:56:33      阅读:130      评论:0      收藏:0      [点我收藏+]

标签:des   style   blog   io   ar   color   os   使用   sp   

做数据预处理一直用Hardly Wickham的plyr软件包,数据量稍微大点,基本就用data.table软件包。Hardly WickHam的dplyr软件包出来有一段时间了,在性能上又有了更大的提高。为了以后使用,做些笔记。

These five functions provide the basis of a language of data manipulation. At the most basic level, you can only alter a tidy data frame in five useful ways: you can reorder the rows (arrange()), pick observations and variables of interest (filter() and select()), add new variables that are functions of existing variables (mutate()) or collapse many values to a summary (summarise()). The remainder of the language comes from applying the five functions to different types of data, like to grouped data, as described next.

 

例子1:plyr::ddply和dplyr::group_by的比较

 1 system.time({
 2 plans <- group_by(flights, tailnum)
 3 delay <- summarise(plans, 
 4 count = n(),
 5 dist = mean(distance, na.rm=T),
 6 delay = mean(arr_delay,na.rm = T)
 7 ) 
 8 })
 9 
10 user system elapsed 
11 0.092 0.003 0.097
12 
13 system.time({
14 ddply(flights, tailnum, function(x) data.frame(count=nrow(x), dist=mean(x$distance,na.rm=T), delay=mean(x$arr_delay,na.rm=T)))
15 })
16 
17 user system elapsed 
18 2.467 0.016 2.500

 

dplyr的使用

标签:des   style   blog   io   ar   color   os   使用   sp   

原文地址:http://www.cnblogs.com/BinbinChen/p/4149336.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!