标签:
特征集分析
数据集为letter-recognition.data,一共为20000条数据,以逗号分隔,数据实例如下所示,第一列为字母标记,剩下的为不同的特征。
T,2,8,3,5,1,8,13,0,6,6,10,8,0,8,0,8
学习方法
1、读入数据,并去除分隔号
2、将数据第一列作为标记,剩下的为训练数据
3、初始化分类器并利用训练数据进行训练
4、利用测试数据验证准确率
代码
<span style="font-size:14px;"> </span><span style="font-family:Courier New;font-size:12px;">import cv2 import numpy as np import matplotlib.pyplot as plt print 'load data' data = np.loadtxt('letter-recognition.data',dtype = 'float32',delimiter = ',', converters= {0: lambda ch: ord(ch)-ord('A')}) print 'split as train,test' train,test = np.vsplit(data,2) print 'train.shape:\t',train.shape print 'test.shape:\t',test.shape print 'split train as the response,trainData' response,trainData = np.hsplit(train,[1]) print 'response.shape:\t',response.shape print 'trainData.shape:\t',trainData.shape print 'split the test as response,trainData' restest,testData = np.hsplit(test,[1]) print 'Init the knn' knn = cv2.KNearest() knn.train(trainData,response) print 'test the knn' ret,result,neighbours,dist = knn.find_nearest(testData,5) print 'the rate:' correct = np.count_nonzero(result == restest) accuracy = correct*100.0/10000 print 'accuracy is',accuracy,'%'</span>
结果
load data split as train,test train.shape: (10000, 17) test.shape: (10000, 17) split train as the response,trainData response.shape: (10000, 1) trainData.shape: (10000, 16) split the test as response,trainData Init the knn test the knn the rate: accuracy is 93.22 %
数据集
http://download.csdn.net/detail/licong_carp/8612383标签:
原文地址:http://blog.csdn.net/licong_carp/article/details/45149197