码迷,mamicode.com
首页 > 其他好文 > 详细

03基本数据管理

时间:2016-03-29 16:26:46      阅读:229      评论:0      收藏:0      [点我收藏+]

标签:

一、创建新变量

transform()函数

> mydata<-data.frame(x1=c(2,2,6,4),x2=c(3,3,4,1))
> mydata
  x1 x2
1  2  3
2  2  3
3  6  4
4  4  1
> mydata<-transform(mydata,sums=x1+x2,means=(x1+x2)/2)
> mydata
  x1 x2 sums means
1  2  3    5   2.5
2  2  3    5   2.5
3  6  4   10   5.0
4  4  1    5   2.5

 二、变量的重编码

(1)

leadership$agecat[leadership$age > 75] <- "Elder"
leadership$agecat[leadership$age > 45 & 
    leadership$age <= 75] <- "Middle Aged"
leadership$agecat[leadership$age <= 45] <- "Young"

(2)

leadership <- within(leadership, {
    agecat <- NA
    agecat[age > 75] <- "Elder"
    agecat[age >= 55 & age <= 75] <- "Middle Aged"
    agecat[age < 55] <- "Young"
})

三、变量的重命名

(1)fix()调用一个交互式编辑器

(2)reshape包中的rename()

library(reshape)
rename(leadership, c(manager = "managerID", date = "testDate"))

(3)names()

names(leadership)[6:10]<-c("item1","item2","item3","item4","item5")

四、缺失值

在分析中排除缺失值

# Applying the is.na() function
is.na(leadership[, 6:10])

# recode 99 to missing for the variable age
leadership[leadership$age == 99, "age"] <- NA
leadership

# Using na.omit() to delete incomplete observations
newdata <- na.omit(leadership)
newdata

na.omit()会删除整行,更精妙的缺失值处理在15章中讲述。

五、日期值

1、将字符串转为日期

strDates <- c("01/05/1965", "08/16/1975")
dates <- as.Date(strDates, "%m/%d/%Y")

2、当天日期、时间

Sys.Date()#当天日期
date()#当天日期和时间

3、输出指定格式的日期

today <- Sys.Date()
format(today, format = "%B %d %Y")
format(today, format = "%A")

4、计算时间间隔

(1)在日期值上执行算术运算

startdate <- as.Date("2004-02-13")
enddate <- as.Date("2009-06-22")
days <- enddate - startdate

(2)使用difftime()函数

> today<-Sys.Date()
> dob<-as.Date("1956-10-12")
> difftime(today,dob,units="weeks")
Time difference of 3102.571 weeks

5、将日期转换为字符型变量

strDates<-as.character(dates)

六、转换、排序、合并

is.numeric()、as.numeric()等

order()

> A<-data.frame(ID=c("May","Jack"),SEX=c("f","m"))
> B<-data.frame(ID=c("Alex","John"),SEX=c("f","m"))
> rbind(A,B)
    ID SEX
1  May   f
2 Jack   m
3 Alex   f
4 John   m
> C<-data.frame(ID=c("1","2"),SEX=c("f","m"))
> D<-data.frame(ID=c("1","3"),SEX=c("m","m"))
> merge(C,D,by="ID")
  ID SEX.x SEX.y
1  1     f     m
> cbind(C,D)
  ID SEX ID SEX
1  1   f  1   m
2  2   m  3   m

七、数据集取子集

1、剔除变量(3种方式)

# Dropping variables

myvars <- names(leadership) %in% c("q3", "q4")
newdata <- leadership[!myvars]

newdata <- leadership[c(-7, -8)]

# You could use the following to delete q3 and q4
# from the leadership dataset (commented out so 
# the rest of the code in this file will work)
#
# leadership$q3 <- leadership$q4 <- NULL

%in%匹配,!取反

2、条件选取

which()

attach(leadership)
newdata <- leadership[which(leadership$gender == "M" & 
    leadership$age > 30), ]
detach(leadership)

3、subset()函数

选择变量和观测变量最简单的方法,可以取代前面的方法。

newdata <- subset(leadership, age >= 35 | age < 24, 
    select = c(q1, q2, q3, q4))
newdata <- subset(leadership, gender == "M" & age > 
    25, select = gender:q4)

选择age>=35或<24的,保留变量q1到q4(其实就是

4、随机抽样

mysample <- leadership[sample(1:nrow(leadership),3,replace=FALSE,)

抽取3个元素,无放回抽样

03基本数据管理

标签:

原文地址:http://www.cnblogs.com/keyang/p/5332811.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!