python文件处理

时间：2015-10-29 11:03:49 阅读：179 评论：0 收藏：0 [点我收藏+]

标签：

6.文件处理

读文件

f=file（‘myfile.txt‘,‘r‘）

for line in f.readlines():

line = line.strip(‘\n‘).split(‘:‘)

print line

文件内容写在内存缓冲区超过1024会写入硬盘

强制写入硬盘 f.flush

line的结果是列表

r 只读模式

w 只写模式

a 追加模式

r+b 二进制形式(linux与windows) dos2unix

w+b

a+b

file文件处理的常用方法:

file的属性

f.close() #标记文件是否已经关闭

f.encoding #文件编码

f.flush #

f.readline([size]) #读一行，如果定义了size，有可能返回的只是一行的一部分

f.readline/f.readlines/f.xreadlines

file.readlines()是把文件的全部内容读到内存，并解析成一个list，当文件的体积很大的时候，需要占用很多内存，使用该方法是一种不明智的做法。

另一方面，从Python 2.3开始，Python中的文件类型开始支持迭代功能，比如下面两段代码做的其实差不多：

with open(‘foo.txt‘, ‘r‘) as f:

for line in f.readlines():

# do_something(line)

with open(‘foo.txt‘, ‘r‘) as f:

for line in f:

# do_something(line)

但是，后面一种迭代占用更小的内存，而且更加智能（依赖于Python文件对象的实现），所需文件内容是自动从buffer中按照需要读取的，是值得鼓励的做法。

至于file.xreadlines()则直接返回一个iter(file)迭代器，在Python 2.3之后已经不推荐这种表示方法了，推荐用下面的

for line in f:

# do_something(line)

这种方式。

f.seek/f.tell

f.truncate 切割

truncate()方法截断该文件的大小。如果可选的尺寸参数存在，该文件被截断(最多)的大小。

大小默认为当前位置。当前文件位置不改变。注意，如果一个指定的大小超过了文件的当前大小，其结果是依赖于平台。

注意：此方法不会在当文件工作在只读模式打开。

以下是truncate()方法的语法：

fileObject.truncate([ size ])

size -- 如果可选参数存在，文件被截断(最多)的大小。

此方法不返回任何值。

下面的例子显示 truncate()方法的使用。

#!/usr/bin/python

# Open a file
fo = open("foo.txt","rw+")print"Name of the file: ", fo.name

# Assuming file has following 5 lines# This is 1st line# This is 2nd line# This is 3rd line# This is 4th line# This is 5th line

line = fo.readline()print"Read Line: %s"%(line)# Now truncate remaining file.
fo.truncate()# Try to read file now
line = fo.readline()print"Read Line: %s"%(line)# Close opend file
fo.close()

当我们运行上面的程序，它会产生以下结果：

Name of the file:  foo.txt
Read Line: This is 1st line

Read Line:

7.增量日志的最佳处理

file只是读了文件句柄

f.seek

f.tell

增量处理的例子

#!/usr/bin/env python

import sys

def filerev(fd):

fd.seek(0,2)

pos = -1

line = ‘‘

while fd.tell() > 0:

fd.seek(pos,1)

d = fd.read(1)

if fd.tell() == 1:

yield d + line

break

else:

pos = -2

if d != ‘\n‘:

line = d + line

else:

if line:

yield line

line = d

if __name__ == ‘__main__‘:

with open(sys.argv[1]) as fd:

for i in filerev(fd):

print i,

apache日志处理例子

#!/usr/bin/env python

import sys

import datetime

import socket

from file_backwards import *

MONTH = {

‘Jan‘:1,

‘Feb‘:2,

‘Mar‘:3,

‘Apr‘:4,

‘May‘:5,

‘Jue‘:6,

‘Jul‘:7,

‘Aug‘:8,

‘Sep‘:9,

‘Oct‘:10,

‘Nov‘:11,

‘Dec‘:12,

}

def parse_apache_date(datestr):

day, month, yearandtime = datestr.split(‘/‘)

year, hour, minute,second= yearandtime.split(‘:‘)

return datetime.datetime(int(year),MONTH[month],int(day),int(hour),int(minute))

def countDict(d, k):

if k in d:

d[k] += 1

else:

d[k] = 1

def parse_apache_log(logfile,ten_m):

result = {}

with open(logfile) as fd:

for line in filerev(fd):

splited_line = line.split()

datestr = splited_line[3][1:]

apache_date = parse_apache_date(datestr)

if apache_date > ten_m:

countDict(result, apache_date.strftime(‘%s‘))

else:

return result

if __name__ == ‘__main__‘:

now = datetime.datetime.now()

timedelta = datetime.timedelta(minutes=10)

ten_m_ago = now - timedelta

key = ‘http.count‘

data = parse_apache_log(sys.argv[1], ten_m_ago)

sock = socket.socket()

sock.connect((‘127.0.0.1‘, 2003))

print data

for k, v in data.items():

sock.send("%s %d %s\n" % (key, v, k))

python文件处理

标签：

原文地址：http://www.cnblogs.com/muzinan110/p/4919677.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行