标签:
现有一份数据集,包含专家对于是否可以使用隐形眼镜的诊断记录(来自《数据挖掘》),尝试用R语言实现规则的提取。
构造
> spectacle = factor(rep(c(rep("myope",4),rep("hypermetrop",3)),3)) > age = factor(c(rep("young",8),rep("pre-presbyopic",8),rep("presbyopic",8))) > spectacle = factor(rep(c(rep("myope",4),rep("hypermetrop",4)),3)) > astimatism = factor(rep(c("no","no","yes","yes"),6)) > tear = factor(rep(c("reduced","normal"),12)) > recommended = factor(c("none","soft","none","hard","none","soft","none","hard","none", "soft","none","hard","none","soft","none","none","none","none", "none","hard","none","soft","none","none")) > df <- data.frame(age,spectacle,astimatism,tear,recommended)
规则产生
> model <- rpart(formula = recommended ~.,data = df2) > summary(model) Call: rpart(formula = recommended ~ ., data = df2) n= 24 CP nsplit rel error xerror xstd 1 0.2222222 0 1.0000000 1.000000 0.2635231 2 0.0100000 1 0.7777778 1.333333 0.2721655 Variable importance tear 100 Node number 1: 24 observations, complexity param=0.2222222 predicted class=none expected loss=0.375 P(node) =1 class counts: 4 15 5 probabilities: 0.167 0.625 0.208 left son=2 (12 obs) right son=3 (12 obs) Primary splits: tear splits as RL, improve=5.0833330, (0 missing) astimatism splits as RL, improve=1.7500000, (0 missing) age splits as RRL, improve=0.2916667, (0 missing) spectacle splits as RL, improve=0.2500000, (0 missing) Node number 2: 12 observations predicted class=none expected loss=0 P(node) =0.5 class counts: 0 12 0 probabilities: 0.000 1.000 0.000 Node number 3: 12 observations predicted class=soft expected loss=0.5833333 P(node) =0.5 class counts: 4 3 5 probabilities: 0.333 0.250 0.417
可视化
> par(xpd = TRUE) > plot(model) > text(model)
算法C5.0的统计汇总
Call: C5.0.formula(formula = recommended ~ ., data = df2) C5.0 [Release 2.07 GPL Edition] Mon Mar 09 14:47:09 2015 ------------------------------- Class specified by attribute `outcome‘ Read 24 cases (5 attributes) from undefined.data Decision tree: tear = reduced: none (12) tear = normal: :...astimatism = no: soft (6/1) astimatism = yes: hard (6/2) Evaluation on training data (24 cases): Decision Tree ---------------- Size Errors 3 3(12.5%) << (a) (b) (c) <-classified as ---- ---- ---- 4 (a): class hard 2 12 1 (b): class none 5 (c): class soft Attribute usage: 100.00% tear 50.00% astimatism Time: 0.0 secs
发现影响医生决策佩戴隐形眼镜后泪腺分泌是否增多。
标签:
原文地址:http://www.cnblogs.com/Dearc/p/4323660.html