机器学习实战笔记之非均衡分类问题

时间：2014-08-18 22:04:13 阅读：392 评论：0 收藏：0 [点我收藏+]

通常情况下，我们直接使用分类结果的错误率就可以做为该分类器的评判标准了，但是当在分类器训练时正例数目和反例数目不相等时，这种评价标准就会出现问题。这种现象也称为非均衡分类问题。此时有以下几个衡量标准。

(1) 正确率<precise>和召回率<Recall>

如下图所示：其中准确率指预测的真实正例占所有真实正例的比例，等于TP/(TP+FP)，而召回率指预测的真实正例占所有真实正例的比例，等于TP/(TP+FN)。通常我们可以很容易的构照一个高正确率或高召回率的分类器，但是很难同时保证两者成立。如果任何样本都被判为了正例，那么召回率达到百分之百而此时准确率很低。构建一个同时使正确率和召回率最大的分类器是具有挑战性的。此时我们可以用F-Score =precise*recall/(precise+ recall) 这个量来衡量，越大越好。

(2) ROC曲线

bubuko.com,布布扣

def plotROC(predStrengths, classLabels):
    import matplotlib.pyplot as plt
    cur = (1.0,1.0) #cursor
    ySum = 0.0 #variable to calculate AUC
    numPosClas = sum(array(classLabels)==1.0)
    yStep = 1/float(numPosClas); xStep = 1/float(len(classLabels)-numPosClas)
    sortedIndicies = predStrengths.argsort()#get sorted index, it's reverse
    fig = plt.figure()         #这三行代码用于构建画笔
    fig.clf()
    ax = plt.subplot(111)
    #loop through all the values, drawing a line segment at each point
    for index in sortedIndicies.tolist()[0]:
        if classLabels[index] == 1.0:
            delX = 0; delY = yStep;
        else:
            delX = xStep; delY = 0;
            ySum += cur[1]
        #draw line from cur to (cur[0]-delX,cur[1]-delY)
        ax.plot([cur[0],cur[0]-delX],[cur[1],cur[1]-delY], c='b')
        cur = (cur[0]-delX,cur[1]-delY)
    ax.plot([0,1],[0,1],'b--')
    plt.xlabel('False positive rate'); plt.ylabel('True positive rate')
    plt.title('ROC curve for AdaBoost horse colic detection system')
    ax.axis([0,1,0,1])
    plt.show()
    print "the Area Under the Curve is: ",ySum*xStep

bubuko.com,布布扣

作者：小村长  出处：http://blog.csdn.net/lu597203933 欢迎转载或分享，但请务必声明文章出处。 （新浪微博：小村长zack, 欢迎交流！）

机器学习实战笔记之非均衡分类问题,布布扣,bubuko.com

机器学习实战笔记之非均衡分类问题

标签：机器学习评价指标非均衡分类

原文地址：http://blog.csdn.net/lu597203933/article/details/38666699

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行