码迷,mamicode.com
首页 > 编程语言 > 详细

k近邻算法python实现 -- 《机器学习实战》

时间:2017-11-08 22:17:51      阅读:244      评论:0      收藏:0      [点我收藏+]

标签:existing   rgs   distance   data   err   append   opera   ret   auto   


 1 ‘‘‘
 2 Created on Nov 06, 2017
 3 kNN: k Nearest Neighbors
 4 
 5 Input:      inX: vector to compare to existing dataset (1xN)
 6             dataSet: size m data set of known vectors (NxM)
 7             labels: data set labels (1xM vector)
 8             k: number of neighbors to use for comparison (should be an odd number)
 9 
10 Output:     the most popular class label
11 
12 @author: Liu Chuanfeng
13 ‘‘‘
14 import operator
15 import numpy as np
16 import matplotlib.pyplot as plt
17 
18 def classify0(inX, dataSet, labels, k):
19     dataSetSize = dataSet.shape[0]
20     diffMat = np.tile(inX, (dataSetSize,1)) - dataSet
21     sqDiffMat = diffMat ** 2
22     sqDistances = sqDiffMat.sum(axis=1)
23     distances = sqDistances ** 0.5
24     sortedDistIndicies = distances.argsort()
25     classCount = {}
26     for i in range(k):
27         voteIlabel = labels[sortedDistIndicies[i]]
28         classCount[voteIlabel] = classCount.get(voteIlabel,0) + 1
29     sortedClassCount = sorted(classCount.items(), key = operator.itemgetter(1), reverse = True)
30     return sortedClassCount[0][0]
31 
32 def file2matrix(filename):
33     fr = open(filename)
34     arrayLines = fr.readlines()
35     numberOfLines = len(arrayLines)
36     returnMat = np.zeros((numberOfLines, 3))
37     classLabelVector = []
38     index = 0
39     for line in arrayLines:
40         line = line.strip()
41         listFromLine = line.split(\t)
42         returnMat[index,:] = listFromLine[0:3]
43         classLabelVector.append(int(listFromLine[-1]))
44         index += 1
45     return returnMat, classLabelVector
46 
47 def autoNorm(dataSet):
48     maxVals = dataSet.max(0)
49     minVals = dataSet.min(0)
50     ranges = maxVals -  minVals
51     m = dataSet.shape[0]
52     normDataSet = (dataSet - np.tile(minVals, (m, 1))) / np.tile(ranges, (m, 1))
53     return normDataSet, ranges, minVals
54 
55 def datingClassTest():
56     hoRatio = 0.10
57     datingDataMat, datingLabels = file2matrix(datingTestSet2.txt)
58     normMat, ranges, minVals = autoNorm(datingDataMat)
59     m = normMat.shape[0]
60     numTestVecs = int(m * hoRatio)
61     errorCount = 0.0
62     for i in range(numTestVecs):
63         classifyResult = classify0(normMat[i,:], normMat[numTestVecs:m, :], datingLabels[numTestVecs:m], 3)
64         print(theclassifier came back with: %d, the real answer is: %d % (classifyResult, datingLabels[i]))
65         if ( classifyResult != datingLabels[i]):
66             errorCount += 1.0
67         print (the total error rate is: %.1f%% % (errorCount/float(numTestVecs) * 100))
68 
69 def classifyPerson():
70     resultList = [not at all, in small doses, in large doses]
71     percentTats = float(input("percentage of time spent playing video games?"))
72     ffMiles = float(input("frequent flier miles earned per year?"))
73     iceCream = float(input("liters of ice cream consumed per year?"))
74     datingDataMat, datingLabels = file2matrix(datingTestSet2.txt)
75     normMat, ranges, minVals = autoNorm(datingDataMat)
76     inArr = np.array([ffMiles, percentTats, iceCream])
77     classifyResult = classify0((inArr-minVals)/ranges, normMat, datingLabels, 3)
78     print ("You will probably like this persoon:", resultList[classifyResult - 1])
79 
80 # Unit test of func: file2matrix()
81 #datingDataMat, datingLabels = file2matrix(‘datingTestSet2.txt‘)
82 #print (datingDataMat)
83 #print (datingLabels)
84 
85 # Usage of figure construction of matplotlib
86 #fig=plt.figure()
87 #ax = fig.add_subplot(111)
88 #ax.scatter(datingDataMat[:,1], datingDataMat[:,2], 15.0*np.array(datingLabels), 15.0*np.array(datingLabels))
89 #plt.show()
90 
91 #Unit test of func: autoNorm()
92 #normMat, ranges, minVals = autoNorm(datingDataMat)
93 #print (normMat)
94 #print (ranges)
95 #print (minVals)
96 
97 datingClassTest()
98 classifyPerson()

Output:

theclassifier came back with: 3, the real answer is: 3
the total error rate is: 0.0%
theclassifier came back with: 2, the real answer is: 2
the total error rate is: 0.0%
theclassifier came back with: 1, the real answer is: 1
the total error rate is: 0.0%

...

theclassifier came back with: 2, the real answer is: 2
the total error rate is: 4.0%
theclassifier came back with: 1, the real answer is: 1
the total error rate is: 4.0%
theclassifier came back with: 3, the real answer is: 1
the total error rate is: 5.0%

 

percentage of time spent playing video games?10
frequent flier miles earned per year?10000
liters of ice cream consumed per year?0.5
You will probably like this persoon: in small doses

 

 Reference:

《机器学习实战》

k近邻算法python实现 -- 《机器学习实战》

标签:existing   rgs   distance   data   err   append   opera   ret   auto   

原文地址:http://www.cnblogs.com/knownx/p/7806231.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!