C4.5分类决策树算法,其核心算法是ID3算法。目前应用在临床决策、生产制造、文档分析、生物信息学、空间数据建模等领域。算法的输入是带类标的数据,输出是树形的决策规则。
C4.5比ID3的改进:
1)用信息增益率来选择属性,克服了用信息增益选择属性时偏向选择取值多的属性的不足;
2)在树构造过程中进行剪枝;
3)能够完成对连续属性的离散化处理;
4)能够对不完整数据进行处理。
C4.5算法优点:产生的分类规则易于理解,准确率较高。
C4.5算法缺点:在构造树的过程中,需要对数据集进行多次的顺序扫描和排序,因而导致算法的低效。
算法过程:
用SQL算法实现c4.5算法的核心代码:
drop procedure if exists buildtree; DELIMITER | create procedure buildtree() begin declare le int default 1; declare letemp int default 1; declare current_num int default 1; declare current_class varchar(20); declare current_gain double; declare current_table varchar(20) default 'weather'; update infoset set state=1,statetemp=1; rr:while (1=1) do set @weather = (select play from weather where class is null limit 0,1); set @feature =(select x from infoset where statetemp=1 limit 0,1); if (@weather is null ) then leave rr; else if(@feature is null) then update infoset set statetemp = state; end if; end if; if (@weather is not null) then b:begin set current_gain = (select max(info_Gain) from infoset where statetemp=1); set current_class = (select x from infoset where info_Gain = current_gain); drop table if exists aa; set @a=concat('create temporary table aa select distinct ',current_class,' as namee from weather where class is null'); prepare stmt1 from @a; execute stmt1; tt:while (1=1) do set @x = (select namee from aa limit 0,1); if (@x is not null) then a0:begin drop table if exists bb; set @b=concat('create temporary table bb select * from ', current_table,' where ',current_class,' = ? and class is null'); prepare stmt2 from @b; execute stmt2 using @x; set @count = (select count(distinct play) from bb); if (@count =1) then a1:begin update weather set class = current_num,levelnum = letemp where id in (select id from bb); set current_num = current_num+1; if (current_table ='cc') then delete from cc where id in (select id from bb); end if; set @f=(select play from cc limit 0,1); if (@f is null) then set current_table='weather'; update infoset set statetemp=state; set letemp =le; end if; delete from aa where namee = @x; end a1; end if; if (@count>1) then drop table if exists cc; create temporary table cc select * from bb; set current_table = 'cc'; set letemp = letemp+1; leave tt; end if; if(@count=0) then delete from aa where namee = @x; set le = le+1; end if; end a0; else update infoset set state=0 where x=current_class; leave tt; end if; end while; update infoset set statetemp=0 where x=current_class; end b; end if; end while; end | delimiter ;
运行后分类结果如下图所示:
其中,class属性记录分类结果,levelnum属性记录各个数据在决策树中的层数。
对程序中各个表的解释:
原文地址:http://blog.csdn.net/iemyxie/article/details/39519767