Python开发一个csv比较功能相关知识点汇总及demo

时间：2015-10-29 10:55:08 阅读：311 评论：0 收藏：0 [点我收藏+]

标签：

Python 2.7

doc demo:

# -*- coding: utf-8 -*-
import csv

with open(‘eggs.csv‘, ‘wb‘) as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=‘ ‘,
                            quotechar=‘|‘, quoting=csv.QUOTE_MINIMAL)
    spamwriter.writerow([‘Spam‘] * 5 + [‘Baked Beans‘])
    spamwriter.writerow([‘Spam‘, ‘Lovely Spam‘, ‘Wonderful Spam‘])
with open(‘eggs.csv‘, ‘rb‘) as csvfile:
    for row in csv.reader(csvfile, delimiter=‘ ‘, quotechar=‘|‘):
        print row
        for x in row:
            print x

with open(‘names.csv‘, ‘w‘) as csvfile:
    fieldnames = [‘first_name‘, ‘last_name‘]
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    writer.writeheader()
    writer.writerow({‘first_name‘: ‘中Baked‘, ‘last_name‘: ‘Beans‘})
    writer.writerow({‘first_name‘: ‘Wonderful‘, ‘last_name‘: ‘Spam‘})
    writer.writerow({‘first_name‘: ‘Wonderful‘, ‘last_name‘: ‘Spam‘})

with open(‘names.csv‘, ‘r‘) as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        # print(",".join(row[‘first_name‘]).decode(‘GBK‘), row[‘last_name‘])
        print(row[‘first_name‘], row[‘last_name‘])
        # print(row[‘first_name‘].decode(‘GBK‘).encode(‘UTF-8‘), row[‘last_name‘])

https://docs.python.org/2/library/csv.html#csv-examples
https://docs.python.org/2.7/tutorial/datastructures.html#dictionaries
https://docs.python.org/2.7/library/stdtypes.html#bltin-file-objects

    def is_same_csv_file(self, compare_csv_files_path, baseline_csv_files_path, csv_file_name):
        baseline_file = open(self.get_csv_files(baseline_csv_files_path, csv_file_name), ‘rb‘)
        compare_file = open(self.get_csv_files(compare_csv_files_path, csv_file_name), ‘rb‘)

        base_line_count = len(baseline_file.readlines())
        compare_line_count = len(compare_file.readlines())
        if base_line_count != compare_line_count:
            print("line_num is not equal\n\r:base_line_count:%d\n\r compare_line_count:%d" % (base_line_count,
                                                                                              compare_line_count))
            return False

        baseline_reader = self.get_csv_reader(baseline_csv_files_path, csv_file_name)
        compare_reader = self.get_csv_reader(compare_csv_files_path, csv_file_name)
        for base_row in baseline_reader:
            if self.is_base_record_exist(base_row, compare_reader):
                continue
            else:
                print("Missing record:line_num:%d" % baseline_reader.line_num)
                print "Expected Data:"
                print(",".join(base_row).decode(‘gb2312‘))
                return False
        return True

    @staticmethod
    def is_base_record_exist(base_row, compare_reader):
        for result_row in compare_reader:
            if base_row == result_row:
                return True
        return False

相关知识点：
字符串格式化 (%操作符)

模板
格式化字符串时，Python使用一个字符串作为模板。模板中有格式符，这些格式符为真实值预留位置，并说明真实数值应该呈现的格式。Python用一个tuple将多个值传递给模板，每个值对应一个格式符。
比如下面的例子：

print("I‘m %s. I‘m %d year old" % (‘Vamei‘, 99))
上面的例子中，

"I‘m %s. I‘m %d year old" 为我们的模板。%s为第一个格式符，表示一个字符串。%d为第二个格式符，表示一个整数。(‘Vamei‘, 99)的两个元素‘Vamei‘和99为替换%s和%d的真实值。
在模板和tuple之间，有一个%号分隔，它代表了格式化操作。

整个"I‘m %s. I‘m %d year old" % (‘Vamei‘, 99) 实际上构成一个字符串表达式。我们可以像一个正常的字符串那样，将它赋值给某个变量。比如:

a = "I‘m %s. I‘m %d year old" % (‘Vamei‘, 99)
print(a)

我们还可以用词典来传递真实值。如下：

print("I‘m %(name)s. I‘m %(age)d year old" % {‘name‘:‘Vamei‘, ‘age‘:99})
可以看到，我们对两个格式符进行了命名。命名使用()括起来。每个命名对应词典的一个key。

格式符

格式符为真实值预留位置，并控制显示的格式。格式符可以包含有一个类型码，用以控制显示的类型，如下:
%s 字符串 (采用str()的显示)
%r 字符串 (采用repr()的显示)
%c 单个字符
%b 二进制整数
%d 十进制整数
%i 十进制整数
%o 八进制整数
%x 十六进制整数
%e 指数 (基底写为e)
%E 指数 (基底写为E)
%f 浮点数
%F 浮点数，与上相同
%g 指数(e)或浮点数 (根据显示长度)
%G 指数(E)或浮点数 (根据显示长度)

%% 字符"%"
http://www.cnblogs.com/vamei/archive/2013/03/12/2954938.html

中文乱码：

import csv

def main():
    with open(‘testfile0.csv‘, ‘rb‘) as f:
        rd = csv.reader(f)
        for r in rd:

            #wrong
            print str(r).decode(‘gb2312‘)

            #right
            print ‘, ‘.join(r).decode(‘gb2312‘)

if __name__ == ‘__main__‘:
    main()

输出：
[‘1‘, ‘\xb2\xe2\xca\xd4 1‘]
1, 测试 1
[‘2‘, ‘\xb2\xe2\xca\xd4 2‘]
2, 测试 2
[‘3‘, ‘\xb2\xe2\xca\xd4 3‘]
3, 测试 3
......

简单说就是不应该用函数str
str是机器内部表示转换为人可读的表示，我们看‘\xb2\xe2\xca\xd4’是“测试”两字，但机器认为是16个ASCII。
We become what we behold. We shape our tools and then our tools shape us

http://bbs.chinaunix.net/thread-3668820-1-1.html

首先要明白的是，python里面默认的字符串都是ASCII编码，是string类型，ASCII编码处理中文字符是会出问题的。
python的内部编码格式是unicode，在字符串前加‘u’前缀也可直接声明unicode字符串，如 u‘hello‘就是unicode类型。
如果处理的字符串中出现非ascii码表示的字符，要想不出错，就得转成unicode编码了。具体的方法有：
decode()，将其他边编码的字符串转换成unicode编码，如str1.decode(‘gb2312‘)，表示将gb2312编码的字符串str1转换成unicode编码；
encode()，将unicode编码转换成其他编码的字符串，如str2.encode(‘gb2312‘)，表示将unicode编码的字符串str2转换成gb2312编码；
unicode()，同decode()，将其他编码的字符串转换成unicode编码，如unicode(str3, ‘gb2312‘)，表示将gb2312编码的字符串str3转换成unicode编码。
转码的时候一定要先搞明白字符串str是什么编码，然后decode成unicode，最后再encode成其他编码。
另外，对一个unicode编码的字符串在进行解码会出错，所以在编码未知的情况下要先判断其编码方式是否为unicode，可以用isinstance(str, unicode)。
不仅是中文，以后处理含非ascii编码的字符串时，都可以遵循以下步骤：
1、确定源字符的编码格式，假设是utf8；
2、使用unicode()或decode()转换成unicode编码，如str1.decode(‘utf8‘)，或者unicode(str1, ‘utf8‘);
3、把处理后字符串用encode()编码成指定格式。

http://blog.csdn.net/devil_2009/article/details/39526713

python命名中下划线的含义

python不仅用奇特的空格表示代码块，还用变量和函数命名中的下划线来表示一些特殊含义，现在总结如下：
1、_单下划线开头：弱“内部使用”标识，如：“from M import *”，将不导入所有以下划线开头的对象，包括包，模块、成员。
2、单下划线结尾__：只是为了避免与python关键字的命名冲突。
3、__双下划线开头：模块内的成员，表示私有成员，外部无法直接调用。
4、__双下划线开头双下划线结尾__：指那些python类中的特殊函数或属性，如__name__，__doc__，__init__，__import__，__file__，__setattr__，__getattr__，__dict__等，自己写变量和函数、方法名不推荐这样的方式。
另外，python中没有像C++、Java那样严格的成员域限制，__双下划线开头成员标识是类私有成员，但是实际上是伪私有，可以通过其他途径直接访问，比如：
class T(object):
def __init__(self):
self.__name = ‘Kitty‘

t = T()
print t.__name
直接访问是会报错的（但是命令行方式下竟然访问成功了，不晓得为什么），但是换一种方式：

print t._T__name
这样就能访问成功，所以有人说这种实现是python灵活性的体现，双下划线开头其实只是一种让程序员遵守的命名规范。
其中的t._T__name是python里私有变量轧压(Private name mangling)技术，
具体可以参考http://blog.csdn.net/carolzhang8406/article/details/6859480。
http://blog.csdn.net/devil_2009/article/details/39619413

Python：file/file-like对象方法详解【单个文件读写】
http://blog.csdn.net/zhanh1218/article/details/27112467

比较python类的两个instance(对象) 是否相等

对于同一个Class,可以创建不同的实例(instance), 如何比较这两个 instance 是否相等呢？我们知道，对于计算机来说，要判断两个对象是否相等，就是看在内存中的地址是否同一个。如果内存地址一样，那么肯定是相等的。这种情况通常出现在一个对象是另外一个对象的引用时出现。
但在实际的开发过程中，要比较两个对象是否相等，并不是通过内存地址来判断的，而是通过这两个对象的部分属性值，或者全部属性值来对比判断的。
假设有一个职员Class, 我们分别创建了两个对象
程序代码程序代码

class Staff(object):
def __init__(self,id,name,sex):
self.id=id
self.name=name
self.sex=sex

我们这样认为，如果id 相同，则表示两个对象相等，id姑且表示身份证编号吧，如果身份证编号相同，那么肯定是同一人，在实际的项目中会遇到这样的情况。
创建对象并查看各自的内存地址
程序代码程序代码

staff1=Staff("123456","张三","男")
staff2=Staff("123456","李四","女")
print id(staff1),id(staff2)
#12327248 12327184

结果很明显，在不同的内存地址，这时候如果判断 staff1==staff2 ,结果肯定是 False。

如何满足我们的需要，只要id 相同的对象，就认为是相等的对象呢，有如下几种方法:
一.重载 Staff Class 的 __eq__ 方法
程序代码程序代码

class Staff(object):
def __init__(self,id,name,sex):
self.id=id
self.name=name
self.sex=sex

def __eq__(self,other):
return self.id==other.id

staff1=Staff("123456","张三","男")
staff2=Staff("123456","李四","女")
print id(staff1),id(staff2)
print staff1==staff2
#True

结果返回为 true,说明是相等的，但在计算机中的内存地址，肯定是不一样的。这里重载了__eq__方法，当然你还可以增加比较条件，例子中只对比了id. 用同样的方法，还可以实现两个对象相加等操作，重载__add__ 方法。

2.直接属性值对比来实现
程序代码程序代码

staff1=Staff("123456","张三","男")
staff2=Staff("123456","李四","女")
print id(staff1),id(staff2)

print staff1.__dict__
print staff2.__dict__

if staff1.__dict__[‘id‘]==staff2.__dict__[‘id‘]:
print ‘yes,equal‘

你会发现，这样也是可以的，同样也可以对比多个属性，这里的重点是用到了python Class 的 __dict__系统内置方法来实现。

http://www.yihaomen.com/article/python/281.htm

Python开发一个csv比较功能相关知识点汇总及demo

标签：

原文地址：http://www.cnblogs.com/softidea/p/4919596.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行