码迷,mamicode.com
首页 > 编程语言 > 详细

python数据清洗cvs里面带中文字符

时间:2016-04-18 17:16:28      阅读:275      评论:0      收藏:0      [点我收藏+]

标签:

  数据清洗,使用python数据清洗cvs里面带中文字符,意图是用字典对应中文字符,即key值是中文字符,value值是index,自增即可;利用字典数据结构没有重复key值的特性,把中文字符映射到了数值index。

  python代码如下:(data数据时csv格式)

import csv

dict2 = {}      #C
dict4 = {}      #E
dict25 = {}     #z
dict26 = {}     #AA
dict27 = {}     #AB
dict37 = {}     #AL
dict38 = {}     #AM
dict40 = {}     #AO
dict41 = {}     #AP
dict42 = {}     #AQ
dict45 = {}     #AT
dict49 = {}     #AX
index = 0
flag = False

#        print(row[2],dict[row[2]])

with open("E:/yuce/test.csv", ‘w+‘, newline=‘‘) as csv_file_write:
        writer = csv.writer(csv_file_write)
        with open(‘E:/yuce/b.csv‘, ‘r‘, newline=‘‘) as csv_file_read:
            reader = csv.reader(csv_file_read)
            for row in reader:
                if(flag):
                    dict2[row[2]] = index
                    dict4[row[4]] = index
                    dict25[row[25]] = index
                    dict26[row[26]] = index
                    dict27[row[27]] = index
                    dict37[row[37]] = index
                    dict38[row[38]] = index
                    dict40[row[40]] = index
                    dict41[row[41]] = index
                    dict42[row[42]] = index
                    dict45[row[45]] = index
                    dict49[row[49]] = index
                    row[2] = dict2[row[2]]
                    row[4] = dict4[row[4]]
                    row[25] = dict25[row[25]]
                    row[26] = dict26[row[26]]
                    row[27] = dict27[row[27]]
                    row[37] = dict37[row[37]]
                    row[38] = dict38[row[38]]
                    row[40] = dict40[row[40]]
                    row[41] = dict41[row[41]]
                    row[42] = dict42[row[42]]
                    row[45] = dict45[row[45]]
                    row[49] = dict49[row[49]]
                    index = index + 1
                writer.writerow(row)
                flag = True
        csv_file_read.close()
csv_file_write.close()

print(‘done!‘)


  上例是真实的数据处理,有两百列属性,三万条数据的原始数据。其中包括中文字符,及缺失值,需要一步步清洗。

 

python数据清洗cvs里面带中文字符

标签:

原文地址:http://www.cnblogs.com/rongyux/p/5404917.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!