R语言-数据处理-样本集划分

时间：2019-05-03 11:20:18 阅读：253 评论：0 收藏：0 [点我收藏+]

标签：art dex 分区 pre distinct desc esc name 5.7

library(caret)

 1 > sIndex<-createDataPartition(outp$V1,p=0.7,list=FALSE)
 2 > outpTrain<-outp[sIndex]
 3 > outpTest<-outp[-sIndex]
 4 > describe(outpTrain)
 5 outpTrain 
 6        n  missing distinct     Info     Mean      Gmd      .05      .10 
 7      139        0      125        1    21.45    3.894    16.11    17.41 
 8      .25      .50      .75      .90      .95 
 9    19.19    21.66    23.54    25.62    27.20 
10 
11 lowest : 12.04 12.62 13.03 14.45 14.61, highest: 27.70 27.95 28.16 29.45 31.30
12 > describe(outpTest)
13 outpTest 
14        n  missing distinct     Info     Mean      Gmd      .05      .10 
15       56        0       55        1    21.75    3.586    16.99    17.48 
16      .25      .50      .75      .90      .95 
17    19.39    21.66    23.50    24.91    27.08 
18 
19 lowest : 15.75 16.03 16.78 17.06 17.41, highest: 26.15 26.97 27.41 28.58 32.30

PS：根据因变量特征值进行数据分区，outp$V1 其中outp为因变量列表，V1为特征值的name

按照p=0.7划分，训练集占70%，测试集占30%，对划分的结果进行描述describe可知

训练集均值21.45 测试集均值21.75

但是有一点疑问，测试集最小5个数值均小于测试集最小值？？？，如何更均匀？？

R语言-数据处理-样本集划分

标签：art dex 分区 pre distinct desc esc name 5.7

原文地址：https://www.cnblogs.com/qianheng/p/10804421.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行