标签:wan fun 2.3 行数据 table seq com return int
当我们用R进行数据合并的时候,有时候会遇到最复杂的情况:被合并的两个数据在行上不完全匹配,在列上也不完全匹配。为了保证合并后的数据不遗漏不重复,我们需要对数据进行拆分,打散,再分步合并。
思路如下:
1 找出两个table中共有的定位变量。
2 按照定位变量是否共有把其中一个table打散。
3 对于sub-table1进行横向合并,只保留新增列。
4 对于sub-table2进行纵向合并,保留所有列。
写了一个函数。
1 yt=function(data1,data2,id="id"){ 2 #columns in data1 3 col1=unique(colnames(data1)) 4 row1=unique(data1[[id]]) 5 #columns in data2 6 col2=unique(colnames(data2)) 7 row2=unique(data2[[id]]) 8 #subset the data2 into 2 tables: 9 #1.id in data1, keep only significant columns (could be whatever columns) 10 #2.id not in data1, keep all columns 11 inter.col=intersect(col1,col2) 12 inter.row=intersect(row1,row2) 13 14 data2.1=data.frame(data2[(data2[[id]] %in% inter.row),!(colnames(data2) %in% inter.col)], 15 data2[(data2[[id]] %in% inter.row),id]) 16 colnames(data2.1)=c(col2[-match(inter.col,col2)],id) 17 data2.2=data2[!(data2[[id]] %in% inter.row),] 18 19 #merge sub-data1 20 want1.1=merge(data1,data2.1,by=id,all.x=T) 21 22 #merge sub-data2 23 ##fill all the uncommon columns with NA 24 want1.1.na=colnames(data2.2)[-match(intersect(colnames(want1.1),colnames(data2)),colnames(data2.2))] 25 data2.2.na=colnames(want1.1)[-match(intersect(colnames(want1.1),colnames(data2)),colnames(want1.1))] 26 if (length(want1.1.na)>=1){ 27 for (i in 1:length(want1.1.na)){ 28 want1.1[[want1.1.na[i]]]=NA 29 } 30 } 31 if (length(data2.2.na)>=1){ 32 for (j in 1:length(data2.2.na)){ 33 data2.2[[data2.2.na[[j]]]]=NA 34 } 35 } 36 want=rbind(want1.1,data2.2) 37 38 39 print(data2.2.na) 40 print(want1.1.na) 41 return(want) 42 } 43 44 45 #testing 46 test=data.frame( id=c(paste0("a",seq(100))), 47 x1=rnorm(100,mean=0,sd=1), 48 x2=rexp(100,rate=1), 49 x3=rpois(100,lambda=50) 50 ) 51 test1=test[1:50,c("id","x1")] 52 test2=test[30:100,c("id","x2")] 53 54 55 id x1 x2 56 1 a1 0.133454784 NA 57 2 a10 -0.240987935 NA 58 3 a11 -0.898845789 NA 59 ... 60 23 a3 -2.340300232 NA 61 24 a30 0.083681099 0.06912847 62 25 a31 0.750677968 2.64913789 63 26 a32 -0.184669801 1.16523179 64 ...
by yant07
Combine vectors of different length in R
标签:wan fun 2.3 行数据 table seq com return int
原文地址:http://www.cnblogs.com/yant07/p/7748562.html