标签:
一、创建新变量
transform()函数
> mydata<-data.frame(x1=c(2,2,6,4),x2=c(3,3,4,1)) > mydata x1 x2 1 2 3 2 2 3 3 6 4 4 4 1 > mydata<-transform(mydata,sums=x1+x2,means=(x1+x2)/2) > mydata x1 x2 sums means 1 2 3 5 2.5 2 2 3 5 2.5 3 6 4 10 5.0 4 4 1 5 2.5
二、变量的重编码
(1)
leadership$agecat[leadership$age > 75] <- "Elder" leadership$agecat[leadership$age > 45 & leadership$age <= 75] <- "Middle Aged" leadership$agecat[leadership$age <= 45] <- "Young"
(2)
leadership <- within(leadership, { agecat <- NA agecat[age > 75] <- "Elder" agecat[age >= 55 & age <= 75] <- "Middle Aged" agecat[age < 55] <- "Young" })
三、变量的重命名
(1)fix()调用一个交互式编辑器
(2)reshape包中的rename()
library(reshape) rename(leadership, c(manager = "managerID", date = "testDate"))
(3)names()
names(leadership)[6:10]<-c("item1","item2","item3","item4","item5")
四、缺失值
在分析中排除缺失值
# Applying the is.na() function is.na(leadership[, 6:10]) # recode 99 to missing for the variable age leadership[leadership$age == 99, "age"] <- NA leadership # Using na.omit() to delete incomplete observations newdata <- na.omit(leadership) newdata
na.omit()会删除整行,更精妙的缺失值处理在15章中讲述。
五、日期值
1、将字符串转为日期
strDates <- c("01/05/1965", "08/16/1975") dates <- as.Date(strDates, "%m/%d/%Y")
2、当天日期、时间
Sys.Date()#当天日期
date()#当天日期和时间
3、输出指定格式的日期
today <- Sys.Date() format(today, format = "%B %d %Y") format(today, format = "%A")
4、计算时间间隔
(1)在日期值上执行算术运算
startdate <- as.Date("2004-02-13") enddate <- as.Date("2009-06-22") days <- enddate - startdate
(2)使用difftime()函数
> today<-Sys.Date() > dob<-as.Date("1956-10-12") > difftime(today,dob,units="weeks") Time difference of 3102.571 weeks
5、将日期转换为字符型变量
strDates<-as.character(dates)
六、转换、排序、合并
is.numeric()、as.numeric()等
order()
> A<-data.frame(ID=c("May","Jack"),SEX=c("f","m")) > B<-data.frame(ID=c("Alex","John"),SEX=c("f","m")) > rbind(A,B) ID SEX 1 May f 2 Jack m 3 Alex f 4 John m
> C<-data.frame(ID=c("1","2"),SEX=c("f","m")) > D<-data.frame(ID=c("1","3"),SEX=c("m","m")) > merge(C,D,by="ID") ID SEX.x SEX.y 1 1 f m > cbind(C,D) ID SEX ID SEX 1 1 f 1 m 2 2 m 3 m
七、数据集取子集
1、剔除变量(3种方式)
# Dropping variables myvars <- names(leadership) %in% c("q3", "q4") newdata <- leadership[!myvars] newdata <- leadership[c(-7, -8)] # You could use the following to delete q3 and q4 # from the leadership dataset (commented out so # the rest of the code in this file will work) # # leadership$q3 <- leadership$q4 <- NULL
%in%匹配,!取反
2、条件选取
which()
attach(leadership) newdata <- leadership[which(leadership$gender == "M" & leadership$age > 30), ] detach(leadership)
3、subset()函数
选择变量和观测变量最简单的方法,可以取代前面的方法。
newdata <- subset(leadership, age >= 35 | age < 24, select = c(q1, q2, q3, q4)) newdata <- subset(leadership, gender == "M" & age > 25, select = gender:q4)
选择age>=35或<24的行,保留变量q1到q4(其实就是列)
4、随机抽样
mysample <- leadership[sample(1:nrow(leadership),3,replace=FALSE,)
抽取3个元素,无放回抽样
标签:
原文地址:http://www.cnblogs.com/keyang/p/5332811.html