python 中文正则表达匹配

时间：2014-05-13 04:36:43 阅读：393 评论：0 收藏：0 [点我收藏+]

需求：由于某个n年前的工具的错误，在复制一批文件的时候产生了大量的"复件xxxxxxx""复件（2）XXXXX"等类似文件，由于目录结构深，文件多，预计在5000万个，但是有多少这种错误的文件不清楚，因此写个脚本遍历删除。

#encoding=utf-8
#author: skybug
#date: 2014-05-11
#function: 遍历指目录，删除中文开头的文件名的图片
import os,re
cnt = 0
pattern = re.compile(ur"[\u4e00-\u9fa5].*")#定义正则匹配表达式
#pattern = re.compile(ur"[\u590D][u4EF6].*")
def walkdir_del(srcdir):#遍历目录
    global cnt 
    for parent,dirs,files in os.walk(srcdir):
        for file in files:
            infile = os.path.join(parent,file)
            file = file.decode(‘gb2312‘).encode(‘utf8‘)#将文件名字符串转码
            file = unicode(file,‘utf8‘)#同上
            match = pattern.match(file)#匹配
            if match:#如果匹配
                print infile
                os.remove(infile)#删除
                cnt +=1
                print "del %s ok!"%infile
                print "del %d files"%cnt
      
      
srcdir=os.getcwd()
walkdir_del(srcdir)

本文出自 “skybug” 博客，请务必保留此出处http://skybug.blog.51cto.com/132577/1409589

python 中文正则表达匹配,布布扣,bubuko.com

python 中文正则表达匹配

标签：python 中文正则

原文地址：http://skybug.blog.51cto.com/132577/1409589

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行