分类回归树算法:CART(Classification And Regression Tree)算法采用一种二分递归分割的技术,将当前的样本集分为两个子样本集,使得生成的的每个非叶子节点都有两个分支。因此,CART算法生成的决策树是结构简洁的二叉树。
分类树两个基本思想:第一个是将训练样本进行递归地划分自变量空间进行建树的想法,第二个想法是用验证数据进行剪枝。
CART_classification(DataSet, featureList, alpha,):
创建根节点R
如果当前DataSet中的数据的类别相同,则标记R的类别标记为该类
如果决策树高度大于alpha,则不再分解,标记R的类别classify(DataSet)
递归情况:
标记R的类别classify(DataSet)
从featureList中选择属性F(选择Gini(DataSet,F)最小的属性划分,连续属性参考C4.5的离散化过程(以Gini最小作为划分标准))
根据F,将DataSet做二元划分DS_L 和DS_R:
如果DS_L或DS_R为空,则不再分解
如果DS_L和DS_R都不为空,节点
C_L= CART_classification(DS_L, featureList, alpha);
C_R= CART_classification(DS_R featureList, alpha)
将节点C_L和C_R添加为R的左右子节点
rr:while (1=1) do set @weather = (select id from weather where class = 0 limit 0,1); set @feature =(select parent from finalgini where statetemp=1 limit 0,1); if (@weather is null ) then leave rr; else if(@feature is null) then update finalgini set statetemp = state; end if; end if; if (@weather is not null) then b:begin set current_gini = (select min(gini) from finalgini where statetemp=1); set current_class = (select parent from finalgini where gini = current_gini); drop table if exists aa; create temporary table aa (namee varchar(100)); insert into aa select class from finalgini where parent=current_class; insert into aa select class2 from finalgini where parent=current_class; tt:while (1=1) do set @x = (select namee from aa limit 0,1); if (@x is not null) then a0:begin drop table if exists bb; set @b=concat('create temporary table bb as \(select id from ', current_table,' where ',current_class,' regexp \'',@x,'\' and class = 0 \)'); prepare stmt2 from @b; execute stmt2; set @count = (select count(distinct play) from bb left join weather on bb.id = weather.id); if (@count =1) then a1:begin update bb left join weather on bb.id=weather.id set class = current_num; set current_num = current_num+1; if (current_table ='cc') then delete from cc where id in (select id from bb); end if; set @f=(select play from cc limit 0,1); if (@f is null) then set current_table='weather'; update finalgini set statetemp=state; end if; delete from aa where namee = @x; end a1; end if; if (@count>1) then set @id = (select count(id) from bb); if(@id = 2) then w:begin update bb left join weather on bb.id=weather.id set class = current_num where play='yes'; set current_num = current_num+1; update bb left join weather on bb.id=weather.id set class = current_num where play='no'; set current_num = current_num+1; if (current_table ='cc') then delete from cc where id in (select id from bb); end if; set @f=(select play from cc limit 0,1); if (@f is null) then set current_table='weather'; update finalgini set statetemp=state; end if; delete from aa where namee = @x; end w; end if; if(@id > 2) then drop table if exists cc; create temporary table cc select * from weather inner join bb using(id); set current_table = 'cc'; leave tt; end if; end if; if(@count=0) then delete from aa where namee = @x; end if; end a0; else update finalgini set state=0 where parent=current_class; leave tt; end if; end while; update finalgini set statetemp=0 where parent=current_class; end b; end if; end while; end | delimiter ;
程序中表的解释:
原文地址:http://blog.csdn.net/iemyxie/article/details/39520537