标签:
软件版本:
Hadoop2.6,MyEclipse10.0 , Maven 3.3.2
源码下载地址:https://github.com/fansy1990/knn 。
1. KNN算法思路
如果一个样本在特征空间中的k个最相似(即特征空间中最邻近)的样本中的大多数属于某一个类别,则该样本也属于这个类别。KNN算法中,所选择的邻居都是已经正确分类的对象。该方法在定类决策上只依据最邻近的一个或者几个样本的类别来决定待分样本所属的类别。-- 摘自《邻近算法》,百度百科
2. KNN算法MR实现:
Hadoop实现KNN算法,最主要的就是设计Mapper和Reducer,使数据流可以套用Hadoop的MR框架,即<K1,V1>--> <K2,V2> , <K2,List<V2>>--> <K3,V3>;
2.1.Mapper设计
Mapper 有三个函数setup(),map(),cleanup(),其中setup()函数在最开始运行一次,cleanup函数在最后运行一次,map函数针对每条记录每次处理一次;这里输入文件设置为训练数据集,即如下格式:
Label,x1,x2,...,xn
第一列为样本类别,其他列为样本属性;
1)在setup()函数中读取测试数据,测试数据格式如下:
x1,x2,...,xn
同时初始化一个和测试数据容量大小一样的数组List<Type,Distance>,用于存储每个测试样本的近邻数据。近邻数据包含k个样本类别以及对应的距离;
2)接着,在map函数中读取每个训练样本,针对训练样本,来更新所有测试样本的邻近数据,即List<Type,Label>,最开始初始化Type=[-1,-1,-1,...,-1],Label=[MaxDouble,MaxDouble,...,MaxDouble].
更新具体做法:
当前训练样本数据tr_i以及类别tr_type,当前测试样本te_i,以及对应的te_i_<type[],distance[]>,
a. 计算tr_i和te_i的距离distance_tr_te;
b. 遍历te_i_<type[],distance[]>的distance[]
find distance[]中最大的max_distance以及下标index_max_distance;
if max_distance> distance_tr_te then
//替换
distance[index_max_distance]=distance_tr_te;
type[index_max_distance]=tr_type;
end
end
3) 最后,在cleanup()函数中,输出List<Type,Lable>中的每一条记录,这里默认test的数据是一个文件,所以这里可以直接使用List的下标代表每一个数据;即最后Mapper的输出为<id,<type[],distance[]>>,每个id代表一个测试数据,<type[],distance[]>代表当前测试数据的邻近类别以及对应的距离;
2.2.Reducer设计
Reducer 只设计reduce函数即可,reduce函数需要做的就是针对同一个测试样本id,统计汇总后的邻近类别以及对应的距离数据,同时需要再次整理这些数据,保留距离最小的k个类别值;然后同时这个k个类别的众数,即最后测试样本id的类别值,输出即可;
2.3. Combiner设计
Combiner,这里可以添加一个Combiner操作,Combiner操作就是map端的reduce操作,可以先把数据汇总一次,然后再发往reducer端,减少网络IO。Combiner需要输入/输出的格式一直,即输入为<id,<type[],distance[]>>,输出也需要是这样的格式。这里可以把reducer的汇总整理代码直接拷贝过来即可,同时需要注意这里就不需要求众数了。
3. 测试
数据使用Kaggle中的digit recognition的数据;数据下载:https://www.kaggle.com/c/digit-recognizer/data 。
在测试算法时,首先截取了较少的数据进行测试,测试数据18个样本,训练样本第一次50多个样本,第二次8.9M(约5k左右样本)的样本;
得到的结果对比:
![技术分享](http://img.blog.csdn.net/20150728105251286?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQv/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center)
上面的测试数据使用训练数据截取类别得到的,上面两张图左边是预测的结果,右边是实际类别。从图1中可以看到(训练数据比较少)其预测准确率很低,但是在训练数据量增加后(图2的8.9M)时就有很好的预测效果了(准确率为100%);
使用上面的算法来对kaggle的数据进行测试(训练集70M,测试集40M),数据可以从上面的url中下载;
Hadoop集群情况:
本地集群情况:
node101 (主节点,namenode,datanode,ResourceManager,NodeManager,SecondaryNamenode,3.7G,2核)
node102 (数据/计算节点,datanode,NodeManager,1G,1核)
上面两个机器是虚拟机,本机是windows7系统4核8G内存;
耗时:
![技术分享](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZMAAAELCAYAAAAcKWtPAAAgAElEQVR4Ae2dv45VSbL1iytc3qKkq/EwkOoRxmhv3gC3n6nd9j63pTs20khISOCNroQ/PmDD10HdH70qyMz9L8/+u7Z0yMyIFSsiV+6dyamCc5795z//+XbnywpYAStgBazAAgWeP3v2bEG4Q62AFbACVsAK3N35MPFdYAWsgBWwAosV8GGyWEITWAErYAWsgA8T3wNWwApYASuwWAEfJoslNIEVsAJWwAo8//Nfc1kFK2AFrIAVsAKLFHj29etX/9PgRRI62ApYAStgBf7LElgBK2AFrIAVWKqAD5OlCjreClgBK2AF7nyY+CawAlbACliBxQr4MFksoQmsgBWwAlbAh4nvAStgBayAFViswPPFDCa4uQL39/fFHB8/fiza1aixY/AaG32Nf/Hixd3nz5+fQOZwPiHYaKDzijnkce+ylD9ztzTUuCV1wtPKRV1gYzwGX4rDNhRPriEcfG73q4Dfmex3bX5UFg+aPmx5/ANY6GhcwT1oyvE61v4g0c4A1F5rb1Fu5NJ89Fu5apiavcY1BR/YKXhyapz28ZfaOXlKPLZtr4DfmWy/Bq5ghAL8DTagt9iAgv8WvEytxl2zE+fWChxFAR8mR1mpgTp1sw1oaZNSTMnfSvG3v/3t7t///vfd69evizDl1vzZTnDOn3HZT9wt2sit+aglbNqP3DrWvvpyjcqdfT3G1BFcOZf6ci715biM7TFu5Wv5euQ2x+0V8I+5bq/xzTPwIMaGwKaATZMP+RU7pU+uEj/1PDw8fKekJSaM9Evx1IEPPuxLW3IrDznCR15wQz7w2ir33H7wUY9yYCMfdQYm+8bGKa5XP9eidUYO6meevfKaZz0F/M5kPa1Pmyk2gLjyBjF1wkvjp+YLvNZOH548xq5xajtafwu9j6aR6x2vgA+T8VpdHhmba20Dwt7CDAnY2ryHYu2froD1nq6ZI+oK+MdcdW3sGamAHiQjQ5qw4IMTILZsx7+0XXIILs29h/hb6bqHubmGdRTwpwavo3OXLDzwpb9R4iORYrIvMOonptTmWN10lSPjMlf8ruTt27d3tPjhyPHYwak/+8BMaTOfjpUnco31jalLuabgqYM2aiS+xak+5jUU14qBo9VGPDkUp7zqV3vg1afx7u9bAR8m+16fHxtZPGA8dH7Ydr5oFyxPDxDtX1CKy07ZvzM5yNL7IDnIQl24TN+jF178P6fudybXXn/P3gpYASvQRYHnX7504TGJFbACVsAKXFgB/2uuCy++p24FrIAV6KWAD5NeSprHClgBK3BhBXyYXHjxPXUrYAWsQC8FfJj0UtI8VsAKWIELK+DD5MKL76lbAStgBXop4P9n0kvJmTwvX97/iHz16uHu3bu3P8bR+fBh+NsUnwR0Hmh9pW9a3Lq+ztM1nRWwAjMV8DuTmcL1CtPN+Lfffn9yeKivV76pPLkGHWt/Ku/Z8XoIn32unp8VCAX8zsT3wSEU0M3Zh9ghlsxFXkwBHyYHWXDdTClZN9Xsx6f2/GMqMPC12v/+77/d/e///vvu11/L37SoeYIH7mwnB37GGZf94Hq1mi9yMda82CKn2mNc86mdfo6NeF9W4GwK+MdcB1hR3ZRKG1PJX7J9/vz5+2yDo8QzV4pSLrUFb/w+SFv8YaOvdWH7HvTnH/h61Q0f+XWcbeHTeqIPPvuwB4f2Y+zLCpxZAb8zOcDqsmGxocW4dOEv+cIW70zevHlfc8+2U89Q/qEES+OH+Gt+6q/5bbcCVmBYAR8mwxrtAsGGFxtuvBhrcSWb+pf2g7+24WNvYYby37r+ofz2WwErMF8B/5hrvnarRXKAREI2XDbvUhGKL/l726iF2pbyl+rHRq6lORxvBaxAXwWeffr09VtfSrPNUSBvkroxZ1/wqz/GGYM/2wMbF/7HUf3PHB9x2JQDW42J/0NDCw6OHI8dnPqzD8yUVvmIy7yKmeKDj/gci9+tFTiTAj5MzrSanosVsAJWYCMF/GOujYR3WitgBazAmRTwYXKm1fRcrIAVsAIbKeCv7d1IeKe1AlbACpxJAb8zOdNqei5WwApYgY0U8GGykfBOawWsgBU4kwI+TM60mp6LFbACVmAjBXyYbCS801oBK2AFzqSAD5MzrabnYgWsgBXYSAF/NtdGwu8t7f39/Y+SPn4sf5DkD8AJO8x/ytyJCTmmxJXk68lV4rfNCtxaAb8zubXCB+BnI2NDZHyA0ruVyNzHEoZGETM1rsY/huuK61LTy/b9KeDDZH9rsmpFsUHFR9OzKdLubeOKenitKtCIZGg2AmqIFTitAv4x12mXts/E9FCJTTOPyaL2sLHBqn1OPPy3arW+nEN9pfngx0c89hjjw1YbE1tqiQ0ffXhKeNuswBYK+J3JFqofKGfetPI4pqIbHH616XTxq02x+LGBCzsvbEvbyAEneeHMPuoBHzjtj4kDQ6yOW33No/1WjH1WYG0F/M5kbcVPnI8Nd+4Ul8bPzVuLm1NPbPZxzYmt1WG7FTiCAj5MjrBKK9TI5vePf/xjdjY20rkES+Pn5q3Fza0ntCQWXWs5bLcCZ1HAh8lZVnLmPGLTY8PL/ZmUT/jmcGg9xGOLMRs1vjXayD8m71jcGjU7hxVYUwF/avCaau84l27WUWbeOLOfqYDLfuzgsh87uOzHDk792QdmTqu8xMOvPmyBUXuM1VfyKybH4qvZw68XuJxTMe5bgS0U8GGyheo7zslmFSV6w9rxQrk0K7AzBXyY7GxBXI4VsAJW4IgK+J8GH3HVXLMVsAJWYGcKPP/yZWcVuRwrYAWsgBU4nAJ+Z3K4JXPBVsAKWIH9KeDDZH9r4oqsgBWwAodTwIfJ4ZbMBVsBK2AF9qeAD5P9rYkrsgJWwAocTgEfJodbMhdsBayAFdifAv44lf2tySYVvXx5/1PeDx/O/42Led4xZ7WVxiGUYmKsWpV82RYxcWX+R+vjnyVfKw+xQ3Hg3FqBngr4nUlPNQ/KxUYXm5BuVnk6gQObfWPGS+PH5JiCiXqYs86bfq2txUXumk+5os84YuhjZ1zyBT8XuKlxxLu1Aj0V8GHSU82TcLE57Wk6sYnyulVdbM4tfg6LEqblK+Fb+Wq+sEee2jU3rsZnuxUYq4APk7FKnRjHBlTbpMKuvjwOabDRqlzZlsdD8crVsz+0MY/NhX4lfM0XGtSulq8WE/a5cS1O+6zAWAWejwUad24F2FjZkHQTpF/yhSrZHuN4EUebcSia7Tk+cHAQ06sN3py/F3eJh1xTfSU8thYnGLdW4NYK+DC5tcIH4mfDLm3mrWloXAs35NtqU9T66Q/VOtcPf2muLV8r39y4Fqd9VmCqAj5MpipmfFEBNsfY2OgXgQ0jm2IDchpXa64tX0uAuXEtTvuswFgF/DuTsUqdGBeb/9wDIGQhttdmVqoHG7l6LMccrphjjmPc8pXqJW6KL2JaOtc4h+JKNdhmBaYo8OzTp6/fpgQYez4FShtQbcNSrGLUrgopJuyKU5/aA6e+Vlz45l5LcmrsmFoVr/VG7BSf5pobp/ndtwK9FPBh0ktJ81gBK2AFLqyAf8x14cX31K2AFbACvRTwYdJLSfNYAStgBS6sgL8D/sKL76lbAStgBXop4HcmvZQ0jxWwAlbgwgr4MLnw4nvqVsAKWIFeCvgw6aWkeayAFbACF1bAh8mFF99TtwJWwAr0UsCHSS8lzWMFrIAVuLAC/myunS/+/f39jwofHh7u3r59+2OcOx8/3vabEanllnnIEXO7ZR7VbmrOqXjN5b4VOKsC/qfBB1hZNi8216HxAabULDHPrwle6Ixcqiv9MbQaOwZvjBU4swJ+Z3Lm1T3R3DhgYkpTNvwpEtyKd0oNxlqBoyrgw+SoK1eou7Xhqi9C2TjV/uLFi7vPnz//YC5hNBagcmDLuIyBG3z2Y79Vq/not2rKvlZd8IHRWPWpHaxbK3BUBXyYHHXlCnWzOcWGFS8dB1zH+NXGQYKNFIx1I8SnbeDAEBP+bItxvMCU/MobfbDZPncMn9ahXNmex4rVfsbFmKvkow4wbq3AURXwYXLUlZtRt25spfB4Z/L+/fuSq2ljQ4SfcQ7Cn+17HDOHqTVHHDHRh4c54mPs1gqcRQEfJmdZyRHzyBvbiJCukK3zT51MbPzUPOUQ0Bj65M5j7G6twNEV8P8zOfoKzqg/NsYpm+OYFPDFZjnEP+Qv5SOGPCVMT1vkmbPxa31oUatLsTWM7VbgKAr4nwYfZKXyxqMbXfbFlNQf44zBn+2BjQt/9IcwQ/4Sh/KX/GGLC5zmwPaImP+ncmouGLMfTM0e/uzLtao/+8jr1gocUQEfJkdcNddsBayAFdiZAv4x184WxOVYAStgBY6ogA+TI66aa7YCVsAK7EyB51++7Kwil2MFrIAVsAKHU8DvTA63ZC7YClgBK7A/BXyY7G9NXJEVsAJW4HAK+DA53JK5YCtgBazA/hTwYbK/NXFFVsAKWIHDKeDD5HBL5oKtwHkVePnyrw/GPO8szzkzfzbXTta19BB9+DD8zYkaNwbfmi5cS3laOYZ81BC4W9aR8+TxUJ1z/JqD+DFz1LgxeLijrcWqHfwU7ojP+KWc1FFqS/lKOGxDteDPcyC+1hKHX+NLvmwjLlqNVbv2NT7j8WU78eHPPmLARJsx6pvS92EyRa0bYVlgFpXxmHQRMwXf4iR/DUOeGm7IX+NVe8/5KG/u1/LU5pbj54zhDp3oj+EBi75jYgKT8+QxvGDH8A7VoJxj+BRT4i7ZNKbVr9WiOmi/xRW+jNXaWj7qUIzG1vIqPudXn/bB1TjDTj0tzByff8w1R7Ubx8Ri32rBb1z6zejjgeF1syQXJc6bUUuGJfdma/1qNQzla3GW5pHzBH/Yhq4cN4Tn+aXN+Jo940rjXEueQ4yX8JdyjrH5nckYlW6M4WbIN0mk1RsdHOXkGyZjczxx0WqsxmVfjLOfMRyMAxsXY/xq+w748w/1lfzgtmyZR9RAvdhirP0ldWaePJ7LTc2l+JavhB9ro/bAj80RMWOxY+sI3JxapvBrzdHvPQ/ln1LXEPZWuvidyZDyK/m5cWKhW4sNrlRW+PDDwfjVq4fvIbT4w6hxQ7wlfI7PY3KpHVvw0Vd/rgNftD2v4Iv8mRcbebXGyK9+fHPryrnzeC6vxlGv2qJfs2fc2DF6RZt1wTeWawyuxYmvVMsY7qkY8uR5T+Up4Xuu06108WFSWrmNbCxypL/FDbnRtH6kjTmdZV6xVke5em5ErTnP0eRWtc2ppTW3sT6e4Z73eU+NbqmLf8w19i4xbrECt7yRFxd3UoKeG9GQRHNyxT2RN945PLm2HhyZc4tx73n05lNN/M5E1dioHwscr6tcc+ZLzJV0Wno/hFZ6gG+tHWuY5xU18gqf1pyxeVzjzDjGwa06RH9MvhwXfJmHHL3aXBv5ci0Z1yv/VJ5nnz59/TY1yPi+CnCTKGu+wUuYwOcbCw7iiYvflbx79/aOVnFgsNHCwThaxU7xa1zw5NjsD0xc4NSP7REx70/4giv6tEM5iQOXx2OqIV/GKhc+rQtbtGM0qPEpT60WxWh/iFP9uUZ82Q4//hiDURs4fDHGrzZw+GKc/fiyndhaS1zmVHv2xVj9Y3NqTHDEpbH41RYY7NHnUoz61Q52buvDZK5yjrMCExSIB5gHV/sTKAy1ArtWwL8z2fXyuLgzKcDfCDlUzjQ3z8UK+J2J7wErYAWsgBVYrMCzr1/9O5PFKprAClgBK3BxBfyvuS5+A3j6VsAKWIEeCvgw6aGiOayAFbACF1fAh8nFbwBP3wpYASvQQwEfJj1UNIcVsAJW4OIK+DC5+A3g6VuBPSlwf3+dT4LYk+49avH/M+mh4g44Sg/hx4/H+TBCJNR53LL+nCePqadnqzngHTNHjRuDhzvaWqzawU/hjviMX8pJHblV3pwzY/O4VGdg4JzDpzk0Hk784cs2fNFqrNq1r/EZjy/biQ9/9hEDJtqMUd+Uvg+TKWrtFMsNwk3BOJeLHVz2Lx334I/a4FlaTyu+ludW2kQtcMf86LdqxAd2qi45Tx7DG3nGcg/hlJP6x7Yl7lxzHte4S1xglUP7+GttxmqOlg9NFKOxU/LVuLAH1xC3Ymu559j9Y645qu08Jm6WW90wW009HhBeW9Vw1ryh69j7Zcm91Vq/KTXoOtQ4a3XmPIEL29CV44bw6Emb8TV7xpXGuZY8hxgv4S/lHGPzO5MxKu0cw82UbzLKzg8LY73hsBGDT+0vXry4+/z5M5AfN6xiwskYDrURrL6SH9yWLfOIGqgXW4y1v6TOzJPHc7mpuRTf8pXwY23UHvixOSKmhq3Zx9ZzS5zWFv3WPObUofxz4msxc9aoxqV2HyaqxoH73MzcKHoj0i/5YsrZHuN4RZzGcpBgQy7GmQd/tit/LT+xtORg3KsNXuZKncGNjTyMwTMuYYkZ28IJPo+xL2m1XuWp2RUzpR+1c2Vu9YGZ0ma+iF3KOSX/EFbXrXddpbkP1VPza209eX2Y1BQ/oJ2bJG6QKTeJxrWmHe9M3r9/34I0fVHTWS40O8J8ptwLS+YzR5OxtY3FLam/Rywa9Ky3Jxf19Zhr5vBhkhW56Dhu2LjiZqPfW4pb3si9az0LX8+NaEiTOblK91vmyeOhOs7k7z333nyqtX8Br2octB83SLzmXsSutdnPqZcYap071yvFhVa6pltrxxrmNYgaeYUv15zHGl/jVIz2g0t1iL7yK1b7OS58mUfxPfq5NvLlWjKuR+45HP7U4Dmq7SyGm0zLqj0gilWM2pWn1tdYxSiPYtQeePXFOPvDFhc49WN7RMz7E77gij7tUE7iwOXxmGrIl7HKhU/rwhbtGA1qfMpTq0Ux2h/iVH+uEV+2w48/xmDUBg5fjPGrTe3ERKuYWpziS33ianzEaK6w1eLAl1qNwa+8+NUWOOzERKsY9atd8XP6PkzmqOYYKzBRgXiAeXC1P5HGcCuwWwX8O5PdLo0LO5sC/I2QQ+Vs8/N8rq2A35lce/09eytgBaxAFwWef/nShcckVsAKWAErcGEF/K+5Lrz4nroVsAJWoJcCPkx6KWkeK2AFrMCFFfBhcuHF99StgBWwAr0U8GHSS0nzWAErYAUurIAPkwsvvqduBY6kwMuX8z/l4UjzPGqt/n8mO1+51gP04cNfn9Aa0wCb7WOnOCeemPgQyDdv5n8IJDXCF+O584Cr1WqejLtV3lLOMbk0bgxe51OLVXvgp/ASW4rBN4VTY6i9xI1vbKu8mQ9ftg9xEwdO40u+bCMuWo1Vu/Y1PuPxZTvx4c8+YsBEmzHqm9L3YTJFrQ2wsdDcALroYcs3i/rnlDonXuubkzPH9ObL/DpmvqojWiuuV7+Ubwy3xo3Bg9F5hY0xLTj1qa3U11jtlziyv8SHjTkyzm1wTblybh3X+kP8GhfYGHO1fMxNMRoLR24VHz4d1/rgMpeOqUdtPfr+MVcPFTfgiBviVjfFBtMZTBkPD69B8AhATbuafQTl6SG6gcVkQ6uwjb3mrl/Oq/mmcmausXPIcVpDqc99RJsxNXvGlca5ljyHGC/hL+UcY/M7kzEq7QgTN1JcerNgo8ySL39LYglTiseWc4RdOWKsmJZvKDb8W17MgznkcdSGLfrgoj/ngguePJ7DGTHw5fiwRw782s/YKWP4psSAZc4xVp65tcEBL2Py9WiVM/pza63Vovw1zBw7mkRszxx+ZzJnNTaK0ZtAS4gbonZTYOdbEhkrVys+8oBt4Vr8pXhsY/nJTf06/979nCOPo3atR+cyp5bMn8dzOHMMNas9bEtrVz7tl/KhmeLo44t2Sk3EwZNb/FM4M8fYMbXfIldJz7F1ZRyaUG/2zx37nclc5TaIW7L4S35BTl4ekhjnaww/8Tn2iOMjzSVvRHkc+pdsc9dlKlfpfupd09y5TI1jLlM1aOXpyUV9rXxzfT5M5iq3Udwtb4bWlMgbN/bcmxuOVp6j+I4yl7lrNXcd5uSrxYTG4dOrhlVM9Mfictzexr3n0ZtP9fKPuVQN94sKxA0Yr7jYRBkXAwaMyjcA/eEmZkneH2SdO3usKaYYdbFejJdMPW/uJf5WvsBP1Sr4eEXtys+cpnBGvOKjnzlLGuU4coNVTmxL21wbOXItGbc079z4Z58+ff02N9hxt1eAG4hMpRs/YxSrvlevHu7evXuL+3ubb0x1kks58Jd8Nf6IyRzEw5f92MGpHxuYJW2LV33kILf6sIEZaiO2FKOccASuZgdTa1tx2VeqZ4g3x2TOiFcMfrWRA1+OGfITN5WzFUfOUktcrlPt2Rdj9ZdqDUy+NAafxuJXW+CwExOtYtSvdsXP6fswmaOaY6zARAXiAebB1f5EGsOtwG4V8O9Mdrs0LuxsCvA3Qg6Vs83P87m2An5ncu319+ytgBWwAl0U8Nf2dpHRJFbACliBayvgf8117fX37K2AFbACXRTwYdJFRpNYAStgBa6tgA+Ta6+/Z28FrIAV6KKAD5MuMprEClgBK3BtBXyYXHv9PXsrYAWsQBcF/P9Musi4Hcn9/dPPLtJKPn58/EDGjMGu2L30tda16oyctVzZR301/FY65jqpg3pjPLXmGifca7alWnRuc+a3Zv1XyOXD5OCrHBsEDxWbBWOmVrPj31Or87l1XVknzVfzrVmf1lPr1+oMfPhY+9K4xoldY7Gt3dbml+cWdZVsa9d75Xw+TE64+nvYBHrLqptKr/nBo9zU3fKB2UN7lDrnanX2+c3VZY9xPkz2uCoLamJj5CEcS0UceOKzPfsZl3BwBCb71Vfyw7vnljkxF8ZRc9gY4w87tujHpb5HS78/a9y5hlyH+pUDe9i0rxVjx6bx2NZqW7WoL9eovqg1+9eq/2h5/Av4o61Yo978EDSgT1zExUPDg6M2wCV/+BRLPDE1PzE1v8ZHn9wl/oxdYxz1UxNzYRz51U89alt7HuSOWsitLXPAj4/asUcLV2A0Tu2leOXq3dc6co25llynxmZf7zrPzOd3Jida3fxwj50aD5s+VGNjA0de4uHLHPiz/Yjj2hyZS8mPToGJfglDfM+WDbIXZ63uNeenuWJerXH44qXX2HsxxymH+08V8GHyVI/Dj+be/DxcEU9/ihjkjdh4MVaOkk39V+ijQU2j3hqslYe615wfucidW/wlDfDlGI/nK+Afc83XbreR8fDEa+wFdu4DpvnggLNUg+JL/pKNmBZvKW5PNq09dNIx8+tZb3CyHsGr+XrmgUv5bz0/zcXcanPNtVAvbebCDq+O3a8r4E8NrmtzCE/rQeDhqmHwx0RbmJJvKFb9Jf4hP+KD0xqwgZnbKicccNd8ag+sjuHQFr6wZWzJpzblqfUzZ+DgqPmwBy76tMTi15yKKeGyHwwccAZuykWcxgzlCmyOy3nVP8Wndbj/VAEfJk/18KigAA9efugKUJsWKBA6n1njs89vwdKfItSHySmW8XaT4CAhw5k3O+a4dovGZ9X27PNb+37Zaz4fJntdGddlBayAFTiQAs+/fDlQtS7VClgBK2AFdqmA/zXXLpfFRVkBK2AFjqWAD5NjrZertQJWwArsUgEfJrtcFhdlBayAFTiWAj5MjrVertYKWAErsEsFfJjscllclBWwAlbgWAr4s7kWrtfLlz9/bMmHD+X/5Qu25l9YyubhZ5/f5gK7ACuwYwWeffr09duO6ztEabqJav8QxU8o8sxzmyCDoVbAChQU8I+5CqIsMfGug413CZdj/1Ig9OT1l9U9K2AF9qKAf8y1wkrkg4UDR1MPYbI/YoMH+4sXL+4+f/78gzLnAAdgyA9/tDmWMRyMa9wljrARP8YfGF9WwArsVwG/M1lhbWLT1I0zp2QzVhy2wNJXDvq0HCSMicnxY/1aY8QQF/ahscZGn1pyHLghf+CI1TqId2sFrMD2CvidyfZr8GOjZlOdU1K8M3nz5v2c0O/5Izf5b71hZ/4Yr5l/lkgOsgJWoKmA35k05VnP2drI2XxbmKWVRo6cZynnlPit80+p1VgrYAV+VsCHyc+aLLLM2fCHYtTPhr+oyBQc/Joj3IwTtMtQ85GLfMyPMQmJyXb8bq2AFdhWAf/T4IX6lzY3NkSoS5jwKa6FKfmIVd+rVw937969Je33toQbk5u4J2R/DjQfGLUpHn/YMqblC7z6c3z2hd+XFbAC2yrgw2Rb/UdlZyNmE83jUSQGWQErYAVuqIAPkxuK25OaAwRODhbGbq2AFbACWyrgw2RL9Z3bClgBK3ASBfy1vSdZSE/DClgBK7ClAv7XXFuq79xWwApYgZMo4MPkJAvpaVgBK2AFtlTAh8mW6ju3FbACVuAkCvgwOclCehpWwApYgS0V8GGypfrObQU6KnB///MXtXWkN5UVaCpwqg965GH6+LH8TYdNJQ7qZM5R/sPDw93bt0//B/yttdD8JQk1P1i1lWLWtkVdpZpK9ZZsa9c7Nh+1gi/NEV+prelSwt7SpvPIc2j5blmTuX9W4FSHyc/TO78lHi59oHTG+cFTX+++1qF9zbNmPZq31q/pFvjwUa/2a3Or5VjLnueiNVNDyYav1DL/km8tW65Zx9qPevJ4rRqd51GB0/yYK26kX3755fusou9rHQViwyltOjX7OlWNy1KrMW9KgdvzPZXrHTd7o6xAXwVO/85EN4HWj4EUFxKzQao9f5shmMArTuOjH1f2h21KfOCXXDk/ubOdHPgZL2lzDuXOPvIoJmwZl/3ErdlSE7UwjhrCxhh/2LFFPy71PVqm/Rl8JQ7y41NcriHXoX7iA4Md7hynmOjHpfGPlml/Eq+5YcDHOLfEYFe8+tQeWPXFOPvD5utnBU7zzuTnqT1a9Eb4/fffn9wY+Lh5Ytyy8W2Giossrfjsf6zq6Q06FE9Mq+WgjDZfLX7mGzE6L2Iy15yx8uZ48lM3reanrzzYMt9a48hPPdTCOGpQPzWpjXG2fKkAAAygSURBVHnju1UbOamPHOTWVjE6D2KiBa/z0Di1g9X4JX1q0nzKR25sjInDHm32KWf2aZz7bQVO9c7kjz/+KM42bqi4SV6/fv3ntxG++Y4p3ex6U5WI4p3J+/c/f5shXEPxJU61LY1XLu33qk85t+jfSp85c0HTWmzJHzbmEP0SpsZXsgdXjaPkK9lKvEO2Ws6w95zfUB34S/MaqoU64ai1tbnW8Fe2n+Iw4cZg4WNcusGGFpr4IVzJrzXQB8eNjb2Wp2aHZ0mruekv4dsi9pb6rDUf5hBrQH9u7ojPa9mDd249EcecetUxxNPyt2rBt2Sujn2qwOl/zPV0uuNGcYPGa+wFtnaDqr+G0VxT82tsqa/5S/6j2Xrrk+cfa4Rm4Yv+mHXLPHmsnKUc6s+xtXHw8ApMjzpruYbsWn+v+bVy5nXR/NrPtWROxU7xZezVx4f/1GC9Efi9gS6qPlxg1QYWH2Mw2Z79jFu4kg/+Wnz2g8st3MydFhw84LDThr/kIw7cUJs5ND774NLc1E2rmOhnDuUHO7XNnBGvvPhLNrBgarlrscQTB4/i8Y1piVdetZXskSswtGByXMmuMSU/tmjjgnPO/IgNHo1Xe/jiwp992B9Rf9WjMWN8YNz+rMDhD5Ofp7Q/Czc2N3Qeb13x3urZWo+t8sc6cI9sVcMt8559frfU7gjcPkxWWiU2bNLtZdPYa13odIWWNdjLPdFb87PPr7deR+XzYXLUlXPdVsAKWIEdKfD8y5cdVeNSrIAVsAJW4JAK+F9zHXLZXLQVsAJWYF8K+DDZ13q4GitgBazAIRXwYXLIZXPRVsAKWIF9KeDDZF/r4WqsgBWwAodUwIfJIZfNRfdW4OXL8Z940Du3+azAGRTo+tlcPJAfPlznmw65CZg742hDB+zxIZFv3vz8IZGKH+rDBS7rXPPX7PAsaTN35tIawaot4/cyplatZ0rdET8Fr3l69cfMYQ919pqvebZVoOthsu1UtsvOQ8vmwTgqCpuO51YJR42v5S/VNbeOWpzWpX3FU4fa9tBHu1zLknqXxOY6loxrddTmvCSXY6+tQLfDJG7Ov//9l7t//vOP75tn7Sa+gty3nvsQ/5C/5xrUctXsPXP34Ir7Nmq92ubK+lxt3j3uGXOUFeh2mJTp7548pK9ePdy9e/f2CbR2U5fs+ZsOwQRhfijUV/KHTTFD8YGvXcET8fFSzozXHOCyrTUOPvyl+JI/11Abw4sffsZL2hZ39pEn58+47CduShucLR7NCU5t5MIXY/WX7GEDo/4cG+PsD9vUi1wRN4VP43Ks+jKn+nJcjH2dV4Gb/wJeb7bffvv9yQ2Njxswxi0b33SouFiaVnz2s5TkyX7scIIfajWuFKu155zKDQ+2GKtt6hieVku9yo2tFTfWp7w5hrnFXzTiotX89JUHW+brOSZftOSjXm3xRW5ich3gAwtG49QONnPMGZMrWs3X4mrVkn3KmX2tHPadT4Gu70ziR1ylixv5119f3/3rX4/fdFh6YPTGLPHUfokN11B8iVNtS+O1Dh4s+Gu1499Du3T+t55Dz/ry+uTaWctsXzqu8Yad+UW/hpuSfy7HUC3UOVTL3PxDvPbvU4Euhwk3FzdPjOPFeOzUp+KVV2ugjz94qSlstTw1Ozxnb/c+/571BVe+T2JMDu2vte69cy+ZQ6sWfGvp4jzHUODmP+aaI0M8BPEae4Gt3eTqr2E015z85FCeo/anzn/tefaqL+4FXjGHMffGreaq90/UoeNe8x1bu+bOtWQOxU7xZazHx1fg2adPX78tmYbeTK1fsEcOsKWHFh+1gMn27GfcwpV88Nfisx9cqW3xq6+lj+I0R9RR8ml9c/yRo8WhPq2n1s81aHz2wREYfGhDq5jog8t2xkta5a7VjR0stdNGfu1rPdk+NIYLDs2JbWxLbI1TeaKuuDQmxtijH5f6p/geo/3nWRVYfJgcQRhufm78PD7CHFzjtRWIe5b799pKePZ7VeASh0mIzwHCQvjBRAm3e1aA+9b3655XybWFApc5TLzcVsAKWAErcDsF/LW9t9PWzFbACliByyiwy3/NdRn1PVErYAWswEkU8GFykoX0NKyAFbACWyrgw2RL9Z3bClgBK3ASBXyYnGQhPQ0rYAWswJYK+DDZUn3ntgJWwAqcRIEun83VS4v7+8ePUPn48Trf1MicQ8OHh4e7t2+ffkT/3rXQ+kv3gdYPVm2lmDVte6xpzfk7lxXopYDfmfRSciZPa2Nt+Wamu1mY1qp9TRj2mk9xa/XjIKEmDpW1cjuPFTibArs5TOJh/uWXX77r6wf7OLcZm3GuuGbPuK3GHCTkj3p936GGWyswXYFd/ZirVL4+4K0fAykueGJziEvt+ZsawWRcjNVX8meM5sm+GC+9avxqnzK/qEfnqDzZd6vagzfnJZfWVsJlP3E921yb5lSf2reqtee8zWUF5iiwm3cmteL1Qf3999+fbID4eLBj3LLxbYeKi7yt+OynTvJkP3Y4wQ+1HJTR5gsurbtkGzM/uKkzxiUubOCXtFp35qEO5k2r+ekrD7bM12sc/OSjRrizT2vJPmLcWoGzK7CrdyZ//FH+psZ4mOMhff369d2bN4/f1Jgf8FgofahLCxd/c3///v1PLriG4n8KTIal8Ynup+EQf21+PxFVDEP8lbDVzGvWxz0Xk4s+9wiTHVtLjiPerRU4mwK7OEx4MHnwYhwvxmNFn4pXXq2BPv7gpaaw1fLU7PAsbefy76X+reY/Ny96l+5FfHO5HWcFzqbA7n/MNUdw3fjHxHN41DYI9dcwmmdqfo0d05/Kv7f6x8yxhZk6/xJXrCO6hD/6urbqy9jMp9gpvoz12AocWYHNPzVYH0R+b6CClh5wtYFVnrCByXbw+Bm3cCXfUHz2kye3cDN3WnDwgBuyZ3+Mc2zY4I1+XBmT/Y+o+p+t+OyDJXLgY960iok+uGxnPLeFN88XO7wt/xQffG6twNkU2PwwOYKgbCxsGnm89znkevN47/W7PitgBfavgA+TkWvEBgycg4Xx3tuj1793fV2fFbi6Aj5Mrn4HeP5WwApYgQ4KPP/ypQOLKayAFbACVuDSCpzyX3NdekU9eStgBazABgr4MNlAdKe0AlbACpxNAR8mZ1tRz8cKWAErsIECPkw2EN0prYAVsAJnU8CHydlW1POxAlbACmygwKqfzfXy5f33KX74cJ1vUmTOeW3naqB8czlyLUvGWk+JR2sEq7ZSzJo2aiLnnmqjJrdW4AgK+J3JjVdJN6foM86b2NgyiB+LXwundWlf84e95lPcWv1YA2raU11rzd95rEBPBVY7TOLB/fvfH79Jce5G2nPi5uqjAJtxZqvZM26rMQeJ5veBomq4bwWmKbDqj7lKpenB8urVw927d2+fwHjAFReAkj1/0yCYwNfiwxdX9odtSnzghy7NodwRp74YD/kDE5fGRUweP6Ke4rBpDo0Lv/rAz21b3NlHjpw/47KfuKmt8mZO9QWv+tWn9sCpL8fF2JcVOKMCq70zqYmnD+Jvv/3+5IHFx8MZ45aNbxpUXORtxWc/dZIn+7HDCX5KCwcxcGnd2Gr5ic1ceTwUX/NrfnLNbXVemYN64y8ScdFqfvrKgy3zTRkHB5zRKmf2KW/2jY1TDvetwNkUWPWdyT//Wf4mRR7kX399ffevfz1+k2LY8qUPbfbFON6ZvHnz8zcpwjUUX+JU29J45hk81NSTX7la/VLuwC+dXytnD1/v+mo6RK2sFf2MHVtLjuuhgzmswB4VWOUw4cHjwYpxvBiPFWYqXnm1Bvr4g5eawlbLU7PDs7S9Nf9QfVvn31t96FG6V/EN1Wy/FbiKApv/mGuO0Lrxj4nn8KhtAOqvYTTP1PwaCz851Ud/CT8crXaIf8jf4l7Dt0Z9kYMr1kzH2Gnn+oh3awXOoMCzT5++frvlRPRBa/2CPWoAy4ardeHDBibbs59xC1fywV+Lz35wuc3cEac2eNQWHNjhy37s4Kb6iYMnx2c/uFrbis8+OCIHPu4NWsVEH1y2M57Ttjhbvsil/qxVyzenTsdYgSMocPPD5Agi8PCzKeTxEebgGq2AFbACWyrgw+T/1OcAYTE4WBi7tQJWwApYgboCPkzq2thjBayAFbACIxV4/j//8/9GQg2zAlbAClgBK1BW4Nm3P6+yy1YrYAWsgBWwAuMUOOQ/DR43NaOsgBWwAlZgLQV8mKyltPNYAStgBU6sgA+TEy+up2YFrIAVWEsBHyZrKe08VsAKWIETK+DD5MSL66lZAStgBdZSwIfJWko7jxWwAlbgxAr8f0coNLkQ63ZIAAAAAElFTkSuQmCC)
得到的预测结果,提交到Kaggle官网,可以得到预测的准确率,如下:
![技术分享](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAA0MAAAAjCAYAAABW3NEoAAAUzElEQVR4Ae2da2wc13XHz+5ySYqi+BIpPkKZpKTKlCXrEdI1K9U2pURubCCwIgiQmtZBESPOBwfxh+SDq/ijrfRD+sFBXRQx9KUCGgkQHAFF6gBGREWFDLqmLKt6UJVlUZRlShQpvkzztVyy59yZOzszO7MP7S5Dmv8LLGfuveeee+e3S3LPnHPPBOa5kE8ZHx/36UEzCIAACIAACIAACIAACIAACCxtAnmJlj82NkZX82sTiaAPBEAABEAABEAABEAABEAABJYkgeCSXDUWDQIgAAIgAAIgAAIgAAIgAAIZEoAxlCFADAcBEAABEAABEAABEAABEFiaBGAMLc33DasGARAAARAAARAAARAAARDIkACMoQwBYjgIgAAIgAAIgAAIgAAIgMDSJABjaGm+b1g1CIAACIAACIAACIAACIBAhgRgDGUIEMNBAARAAARAAARAAARAAASWJgEYQ0vzfcOqQQAEQAAEQAAEQAAEQAAEMiSQ8DlDGerG8AUmMDcboejIIM2Nj9Dc9BQFZP6CQsorqaDAqgoKhcMLvCJMBwIgAAIgAAIgAAIgAAKLlwCMocX73qS+svl5mrnbQ5EvPiOKROLGRft6iML5FK7fQOHqRygQhEMwDhIaQAAEQAAEQAAEQAAElh0BGENL/C2fZ2/Q9PULyiOU8FIiMxTpuUpzow8of8NWCubBS5SQFzpBAARAAARAAARAAAS+9gTgIljKbzF7hCxDKBCgea77FdXDcXPRoX6aufG/fmJoBwEQAAEQAAEQAAEQAIFlQwCeoSX8VktonOwRkpK3lkPgqupppu+W8v4ECleoPUNzUxMUKltDebWP0OzAHZq9/akyiCL9t1XI3BK+fCwdBEAABEAABEAABEAABDIikHNjSLwVV8510L1bNxwLzeM9LH+9/+8ojzf1J5OZGBuh/3nvd47x9sqOPc/T6rp6e9PX/lySJag9QvpKgyEKFqygwqZNuiXuOMsyukQ+/5TCa9YSsUcpcemk41U76agWev0Dev/VNl3DEQRAAARAYKkT6L1Er5yfsK5id8uTdKDBqnqe9Jz9kH5l3Ivj/hC9uKeV2sqcomMXuugfe6Kxxsoaevtpp2KnHqLEc/fSyXfvUYfHfOnpiS0pl2e3/+0J+tdTRBt/9h79cG9lLqeCbhAAgQwI5DxMTgydsQcDCZeYikxCBcuwU3mEbMkSUnojbWF08zPTNDt8Pym57rdshhBLtz/SmHQMBEAABEAABJYIAZchJKvuOP8hnez1W/8D6vyD3RASuSgdO91FnSOxMWKcOAwh6Rq8R6+cjSl2GzAikmjunrNiCMWXdPXEa8hBS/c/K0MoB5qhEgRAIMsEcu4Zmp6coFn2YlTU1tO29r+hUCjmndDXkkymZHUVffvvX9bi6njz0sd082KX0ltWXevoWw4VSZ+tSyCf02fXNOqq7zFc3UDRu7dIDCEp0bEhyquoVufJfrT/po9+8b2aZGLoBwEQAAEQWDIE2LDpNjxCm5o20E92rCbtzenovkbPNjRTiftaevvomBoS8wbpMcc6r9Fj35ExvXTe9BpZnh5tdA0OsNHUwF6keBlt1HT09rJnyulBIh4f80TZF5WmHvvQHJ1rj1CO1GdR7QCNn/sTcX4l71Kyhcpbmr37ctA6df4kTY5pxRW0YvceKtRVukZjHZfZ7NbF3a/bvY56bPbHJF6zi68fz+FOGvnkDs27+3W745ISXUPy+RKv1zERVzQ3s929Pm526Muvp+JdbZQoPZdDnlzXkvb1utf78PWUHAoPr55o5P49mmGDqGhVqachJLpTkbGvQTxNYghJqN3Wp/f66rWP+bqdy3OEdAmVVaWULjuYl0fBspir3q5D64od+Q7cj4L00zeMljMv19Heqr+ljn6j3v1WkOu214+O05A1WELrjL7jXYYeS9YhJwNispbMW52mpljf8S5LuTrR87/5u3vODtRAAARAAARSIzAyQF2mYdPatFqNKWkqJhVsPRGhBx5axoaMm2lUVEyPmWFxJTuqaLfI6jG9Y6YHp4hatE3TUGLI8NfZO6Mi3EAH9j9Jb/MrWUge8Uq00SYjnSUdPc6R2a8N0vlfGKFxRH9FG1uyP8PXVaPzS7Jc5RBNdpwm45uO60u5gsD95zop/mEiC0co8ZrFUHAZmmOXafj8NccCRcewGEKOVrMyNuHd7iXLbcnmS7Zep1oxrOzGJ/e61h+nb+YOjbuuz64zTt7xHov+9K7XrjvT85x7hibHDTP/zvUrJC8p9Rs3U/Nf7rLWnoqMFpaQutvdl1R1/fYn1J4j3Zfxcfwy0Q3+tl/E98ImzNsTZd8kaiw3VOt+Vcsn2vAUUfEw0eWPuYU9LFu28PEW0Sef8ZF18PqI3P3clIXCueMsLbEzq8n3xBYpxwkW0hkZUymGiDaSrNZT36eDWxrj9hMdfa7OElEnDjkxlJxheErmjZ305iPiiWqjZ39zkI6+fIKO8h+9Q61tpq5OuqiMtIO0c2eNUz9qIAACIAACaRIooGrTsKGyFVRDo9RN09TPAQhNuj0ljeaYhsfp7QbXgJFJMm5dhai+1N13jf7ltMzJpaiUfunaVzR2oUd5ozY1lVJNz6hnqJzSOJJYj2vWnFWNPULEhtFzdD1ns2RDcRXfyT9gKNJ35R139+UL8Un2HNnv4JuGiUOOH3F45T9p/L5pKLPG0LoDVOL+DPgu+RrNqK9cBRTe/l0qLtcejiGa4ajKwhJ+kLyMteY01zAzQNP8FStsfkXzVR/XofUnv664oVZDsjV30pS6Jj2HueaxPjbwmtnjpdfAW7fzC6yIHUs9n0TGv+Kfmom9x+Oc37/E8yVZb4M2OM31Dn9Gs+Ix1Mz152NqRBmgYfYaGe+Z3/W59Fny+nr09ZvvMX9W0rpeDwSZNOXUMySGy8Qo/zV1FTGKPv7jf1E0GlXJE5LJ2Id/OTSokjGIV6h23UZ7V/bOZ1awIcNGkJiKI2biB20IiXG0/Vv8B5s/JTc+YgH+LSxmw2iWfyPHuTouP6Tw/QxV/4L7uFr8DdWarR+BfF6jWaK892duViZJXKK8x2huOLZ/y64jfmQN7X5njn79utEjYXLvD/yWdldrQ+QI/Xpgjtv49d4RQ+iNM8Y/M7uyff9BJ0w5rYu0XP8Z+uCUCMd0nWDjR8qZ27fUsWLnC9QuZ3qMnHedMRI67HuBtqUW5SejUEAABEAABOwERiPxf7Otfu3BsRrUSUlFgdEwMU5XzX/vYxcGTAPFe4zy7HSaxk5lVVyiBccMNr2qnQ2cf5ckDGwk/WBH7P+eY4xXxa3HSybrbZXU8uZHyy5ZgtzxtxtCgjV68ySNXIl930iIelgbO1VUoAybKspT8XEFFOT7yv5lJeWlbQj5a0urJ9matZejpM4M9WumfHUtX6mvi8ZcYhgcoLL6lR5Ts6E3IsblNEU+Ye9Rh7y0p8xDPNl8ydbrVlneRmW7D1C5Dnsz9QfK1hthcBwuKyGLgTWPW9dXIvKO0Eab0qTzp3m9NtXZOM2pZ0gnRpDwrCef208rS8tUMgXJDDd09w6N9N+l8po61ZZIRmeKy6lXyE5TGS78G8Y2jjJkpG+QjR1pqDR/80r5Uy3eI3YkqbYRPhEjnibZiJJPPPfpun2ciGShhEoraJZTa6sSmaaZzy5RwfrHKcCsvYrKPneTPWrK1Dckgqwj/dJGh9i4OSQD+4/Tm1u+T2cSKHnpx4c4KtQom3aJ0XQ4Jl3dSOu5dobbflpltku2Oja6rFLdTjv3scypw3Sx6zXa1ErUfc6Qteu25HECAiAAAiCQOwINdfRi9w321EjShA/pWNKZJOGCyItgEf3c5fVRw8ua6Sf7+czcV3Ts9CWq3v84NUl4nGlE7d5k7EVS8n4/fPX4DUB7xgS0R0J7EJRC867//Us0tdm+78dnNvXF29bXe9rYO5SvjaM2WrlGPE8chsVGgS6hdSno1sLZPiZbM38FlBIoNMJPjZr8nKY56SsXr9x3jWZT1qjonw9oju+5O4sROkh+BgcL+87XIMaNTVscY/792t1sE4id6vC2wJpnqGxzVaxDzqYu0UjHkBlnpL1E0uHSl4wX/64/zPXKTNkowWwo8dMRDAZp5wsHac+hHypDSORWVVRSTeMGNWTiy1FKRUbrn5r4iob6+0gMp4qa7Hpa9ByJj/zJvPFHDoPj113z0zvJRpIYT2KDyPkke4T4F1h5i3Rdbl0UJ9acbm+whH/B8tg44xLkJAjR0QGavHiWIn09NM/PFtIlynfHZr64SZMXzlL0QWx/TaCAky6Ur9FiaR31fp29SQyh5EqNMDiHHIfIqb1D1r6hGtr2vOEtklA52WNkhMgdoW1sGKGAAAiAAAg8JIHSsLE/yHO4RzibkltNbd95kn5eaRtUWUovFkndPcZlCCkDxzbOfWrtKzLC7Ugna+CU3Mn3FdmUufXYunCaZQLaIyH7RZT3QjwYeq+M3QuS4rz8JX345hALs9fksdhm/OhULARPa5obT9HzpAfk6uiz5oymszwpnJRAeVyeIQ6I4mKElWWkO831FraIx+cZyhv5U5x3ap4Tcc1bi7Hv87Ia40+85s/l9cavIK7F240QJ/ZwDZLoQLxA9mcK2TVJUoVUZPSYyNSkSsYgmekKi1fp5gU88idR7RPymFI+pOMSUscGU1UjH/mXdMCs52fZEmLtYhDm12+gmVtX2dsToRXbnlKGUITvnARLKvhfklHmZ6ZYppsfJxQwW4xD+Bvr49ocAn6Vrn+y9gu99B57iMQg4ba9z7G3Zl+j7JxKq1R877f0Pr8sHXr0G29Rx6E2DssjMkLlTtAZCZXjrWZHReb19gT/xLUSHEEABEAABJITsO0Psvb22PYReShoepqTH1jt8vyfUa7ZxyQxhHR2OfEW+RhJPb3mjT1Jyf1u7GaekcqbvVLy3KKGMfMZSf56rGXiZFETiO070vtKzOXyl2eVZc7KZqY9T500Xid7jJyXpT0Z6e1bcupItea7ZlPB/JSkIbF7U5KF/pkD3Z4U1lFQVkAR3pdlGIF2nbHVJpsv2XotTdpgsZgboYuRGXOPjxa0+uP3AGkR+9F3/oe8XrvuTM5z6hkqLl+tssjNct7Guzevq3XqPT/yZb5wZTF/iJPL6Asc7PtcnSbKTKdls35U4XFs6KhwOdZ+67/ZQ8SvcXMmCZubZW/RLFtFEv65kmObVZ3PSxtNoewewrUNnB2uil2u/OH8/DqF1/4FFW1/ikLFpdZEwVXlFOAH29pLaHUNhWtS3tloH0pDn1806rwX6FkxhLjosDWjlsZPMaJU1jnOUrf2NWP/0cAH9JJbRfUh2q/2LnE4nRhddJAOs6GEAgIgAAIgkAEB/v/Rqjw6UerqkS9tHODdM24mMgiTO8BHCUiSgnc/ZOMk9lwha89QZQmHthml56wtNM7L2LG8UhP0+wvm3NbeI7tRZSr0O2RLj59+tDOBmDcicuVTtVfEwlJSxMmYuEiYnPJgiBdBv+INFWuc6yT2JVlCrVIdZ4acuXSFCo19bZbnyPI6FFk3io0hCa7LpdOrmnDNmotKmCCjdcKBFPc5iTEinjYrY57eU8M3w4s9DKEU5ku4XvcF+uozjTndb43T+7yshriThPOne71x2jNryKlnSELgmra20JVzp+n/PjqnXnq5desetULnUpGxJ2OQvUcLXoo5U1wt7we6+zEbQebstd+Khb9V84dThc4Vmm3iDZINRWwkpesuSfXi2NtTsHE7TX96kaL3v6Cp0QckHh+7oRNUz3WK2bwhDqnLX7811Rni5CrWbuO2E0SSFY5fjnLqFvW/w54cR2OCSus/0OF9h+nIqRN0ZAu/7KKu5AiO/UauPvswnIMACIAACKRKgEPeNvFzg85PUHfPDXqFX7oYe3S4ZmVo088VEgOKkyHE7Rni/q3mTTYe8/tBrWmCfsXGk70Yzx5qph80dakHs7rn3tRUZxhVDu+TaBAPlDx4Va/F0JpUj31ynKdBQHsDjIQIwzc9hvId/cISeT6Qcz+PkrT2Ebkzi7n1XKNJKxOdhFqxPlPE8O7UUYhD56KS2pmfNRQrFZTvcV83XMePG+Eomfn7HNZle7a8tfmfvSySoEGerySJHjyvKzaJz1myNWsuzushK6GCj1rdzHvz1DWb4Ye6WYzOFR7XTNb74DdfsvW63iM/fbZ9XMb77veeuPSxMZjSe5zq9VpAsnMS+5acHX1xWmr5QW6bd+1xtD/6xC5Hau1UZOwKxDOUkyIGj2SK06m0N/K5So9tzlb9hNEvMvJyGDmNZh/LqOKum81ZPgTzwlTY3ELhdZuJU/NR5G6v9wycujGfZQoe/aYKsfMWSqG19TXSGd8MackEp705F+me2H8pF8lY18cGkWuAZKB7J5Z4QfW2tlseo/bn21M3uFyqUQUBEAABELARkDTYLco9ZDVaD0q1Wuwnsmdog7lHSLdLiFqrlSXO8i7pbp9jyY5W+mVTyNErc8vDX9Mp2dKTzpzLRbawZYvDmxJa56wLB9lTsoLv+zqKZQg5Wr0rZmYy705plc347nntm/VdIyXkap3rtiyHc9k3/6dyXS6tzmrSNQsXvc/HHGqFlDlVedc8rjkJ04TzpbBe9zri9Lnmj3/fE7wnSedP/3rd682kHmCPy7yfgr6+PrqaX+vXjfZFRmB+bo6iXw5TXqnzH8ns6CCFOOGCe9/QIlt+4uVYmes4RO6ypPhOLI5eEAABEAABEACBRUJAnlPDW5xX6lTNi2RZWAYICIGchskB8cISCHBYotsQUm9yaeXCLiSbs1lGkKn09VdhCGWTL3SBAAiAAAiAQE4JcMjUJwOUxw9Ude5gzumkUA4CKROAMZQyKgj++QlwSN6rSJzw538fsAIQAAEQAAEQSJWA65kzqQ6DHAgsEAGEyS0QaEwDAiAAAiAAAiAAAiAAAiCwuAjkPIHC4rpcrAYEQAAEQAAEQAAEQAAEQAAEDAIwhvBJAAEQAAEQAAEQAAEQAAEQWJYEYAwty7cdFw0CIAACIAACIAACIAACIABjCJ8BEAABEAABEAABEAABEACBZUkAxtCyfNtx0SAAAiAAAiAAAiAAAiAAAv8PtPF9gNonlmsAAAAASUVORK5CYII=)
准确率为92%左右,同时可以看到排名为557,这个很后了。
4. 总结
1. knn算法由于需要计算全部训练数据集以及测试数据集的距离,所以算法比较耗时(这里可以看到一共有110M的数据耗时11小时左右),这里没有对该算法进行调优,可以参考国外的一些论文对此进行修改,比如《Efficient Parallel kNN Joins for Large Data in MapReduce》;
2. 不同数据针对的问题不一样,这里kaggle的数字识别,一般会使用随机森林来做;
版权声明:本文为博主原创文章,未经博主允许不得转载。
KNN算法Hadoop实现及kaggle digit recognition数据测试
标签:
原文地址:http://blog.csdn.net/fansy1990/article/details/47102073