码迷,mamicode.com
首页 > 编程语言 > 详细

python实现KNN,识别手写数字

时间:2017-07-27 13:39:48      阅读:203      评论:0      收藏:0      [点我收藏+]

标签:its   apr   测试数据   utf-8   div   size   分享   logs   .net   

写了识别手写数字的KNN算法,如下图所示。参考链接http://blog.csdn.net/april_newnew/article/details/44176059。

技术分享

# -*- coding: utf-8 -*-

import numpy as np
import pandas as pd
import os
def readtxt(filename):
    text=[]
    f = open(filename,r,encoding=utf-8)
    for line in f.readlines():
        text.append(line)
    txt = list(text)
    txt=np.array(txt,dtype=float)
    txt = txt.tolist()
    return txt

def readdata(rootfile):
    data = []
    label = []
    for root,dirs,files in os.walk(rootfile):
        for name in files:
            filename = root +\\+name
            txt = readtxt(filename)
            data.append(txt)
            label1 = name.split(_)[0]
            label.append(label1)
    data = pd.DataFrame(data)
    return data,label

def KNN(traindata,trainlabel,testdatai,K):
    length = len(traindata)
    newtest = np.tile(testdatai, (length,1))
    newtest = pd.DataFrame(newtest)
    diff = newtest - traindata
    diff = diff**2
    cha = diff.sum(axis=1)
    cha = cha**0.5
    result = pd.DataFrame({label:trainlabel,
                       cha:cha})
    labels = result.sort_values(by=cha)[:K]
    frequent =labels.groupby(labels[label]).size()
    labely = frequent.argmax()
    return labely
        
def test(trainfile,testfile,K):
    result = []
    traindata, trainlabel= readdata(trainfile)
    testdata, testlabel = readdata(testfile)
    for i in range(len(testdata)):
        labely = KNN(traindata,trainlabel,testdata.loc[i,:],K)
        result.append(labely)
    tongji  = pd.DataFrame({result:result,testlabel:testlabel})
    accuary = len(tongji[tongji[result]==tongji[testlabel]])/len(result)
    return result,accuary
    
trainfile=rE:\trainingDigits
testfile=rE:\testDigits
K=3    
result, accuary= test(trainfile,testfile,K)
            

注:训练数据集有2,210条记录,测试数据有670条。准确率并不高,只有0.45。目前不知道为什么,以后多学习,争取优化代码。

python实现KNN,识别手写数字

标签:its   apr   测试数据   utf-8   div   size   分享   logs   .net   

原文地址:http://www.cnblogs.com/chenyaling/p/7244266.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!