标签:
现有一份数据集,包含专家对于是否可以使用隐形眼镜的诊断记录(来自《数据挖掘》),尝试用R语言实现规则的提取。
构造
> spectacle = factor(rep(c(rep("myope",4),rep("hypermetrop",3)),3))
> age = factor(c(rep("young",8),rep("pre-presbyopic",8),rep("presbyopic",8)))
> spectacle = factor(rep(c(rep("myope",4),rep("hypermetrop",4)),3))
> astimatism = factor(rep(c("no","no","yes","yes"),6))
> tear = factor(rep(c("reduced","normal"),12))
> recommended = factor(c("none","soft","none","hard","none","soft","none","hard","none",
"soft","none","hard","none","soft","none","none","none","none",
"none","hard","none","soft","none","none"))
> df <- data.frame(age,spectacle,astimatism,tear,recommended)
规则产生
> model <- rpart(formula = recommended ~.,data = df2)
> summary(model)
Call:
rpart(formula = recommended ~ ., data = df2)
n= 24
CP nsplit rel error xerror xstd
1 0.2222222 0 1.0000000 1.000000 0.2635231
2 0.0100000 1 0.7777778 1.333333 0.2721655
Variable importance
tear
100
Node number 1: 24 observations, complexity param=0.2222222
predicted class=none expected loss=0.375 P(node) =1
class counts: 4 15 5
probabilities: 0.167 0.625 0.208
left son=2 (12 obs) right son=3 (12 obs)
Primary splits:
tear splits as RL, improve=5.0833330, (0 missing)
astimatism splits as RL, improve=1.7500000, (0 missing)
age splits as RRL, improve=0.2916667, (0 missing)
spectacle splits as RL, improve=0.2500000, (0 missing)
Node number 2: 12 observations
predicted class=none expected loss=0 P(node) =0.5
class counts: 0 12 0
probabilities: 0.000 1.000 0.000
Node number 3: 12 observations
predicted class=soft expected loss=0.5833333 P(node) =0.5
class counts: 4 3 5
probabilities: 0.333 0.250 0.417
可视化
> par(xpd = TRUE) > plot(model) > text(model)

算法C5.0的统计汇总
Call:
C5.0.formula(formula = recommended ~ ., data = df2)
C5.0 [Release 2.07 GPL Edition] Mon Mar 09 14:47:09 2015
-------------------------------
Class specified by attribute `outcome‘
Read 24 cases (5 attributes) from undefined.data
Decision tree:
tear = reduced: none (12)
tear = normal:
:...astimatism = no: soft (6/1)
astimatism = yes: hard (6/2)
Evaluation on training data (24 cases):
Decision Tree
----------------
Size Errors
3 3(12.5%) <<
(a) (b) (c) <-classified as
---- ---- ----
4 (a): class hard
2 12 1 (b): class none
5 (c): class soft
Attribute usage:
100.00% tear
50.00% astimatism
Time: 0.0 secs
发现影响医生决策佩戴隐形眼镜后泪腺分泌是否增多。
标签:
原文地址:http://www.cnblogs.com/Dearc/p/4323660.html