Python3 re模块

时间：2015-07-29 15:40:19 阅读：172 评论：0 收藏：0 [点我收藏+]

标签：

　　Python版本为3.4.3

　　直接看官方文档。。。记录当笔记

　　https://docs.python.org/3/library/re.html?highlight=re#module-re

　　和一般的正则表达式不同，Python用想表示一个‘\‘必须用‘\\‘，而正则表达式中是用‘\\‘表示一个反斜杠的，所以在Python中要用‘\\\\‘表示一个‘\‘，太麻烦，所以可以在字符串前面加上‘r‘表示这里面的字符都是不需要转义的原生态字符。

　　现在我们所要分析的字符串为"fsfxczf\/sadfDOTA2s\\\\dafasdsfDOTA2wwwwwwwwwww"。

　　re提供了两种简单的匹配方式（查找字符串是否有某个字符串），一种是search，另一种是match（匹配开头）。

import re
game = r‘doTa2fsfxczf\/sadfDOTA2s\\\\dafasdsfDOTA2wwwwwdotawwwwww‘
#三个参数分别是，简单的正则表达式，所匹配的字符串，匹配模式（可以用数字代替）
game2=re.match(‘dota2‘,game,re.I)#匹配模式为忽略大小写
print(game2)
#<_sre.SRE_Match object; span=(0, 5), match=‘doTa2‘>

　　这个模块中的方法如下：

re.compile(pattern,flags):这个方法是对一个正则表达式变成了一个表达式对象，可以多次使用。

import re
game = r‘doTa2fsfxczf\/sadfDOTA2s\\\\dafasdsfDOTA2wwwwwdotawwwwww‘
style=‘DOTA.‘
rengine = re.compile(style,re.I)
gamename= rengine.findall(game)
print(gamename)
#[‘doTa2‘, ‘DOTA2‘, ‘DOTA2‘, ‘dotaw‘]

其中常见的flag值可以为如下几种：

re.A/re.ASCII	值为256，仅匹配ASCII码值
re.I/re.IGNORECASE	值为2，无视大小写
re.DEBUG	值为128，显示编译表达式的debug信息
re.L/re.LOCALE	值为4，暂时没有理解是什么意思
re.M/re.MULTILINE	值为8，使得匹配可以跨行

re.fullmatch(pattern,flag):完全相同则返回match对象，否则none

temp=r‘this is a temp‘
print(re.fullmatch(temp,‘this is a temp‘))
print(re.fullmatch(temp,‘this is b temp‘))
#<_sre.SRE_Match object; span=(0, 14), match=‘this is a temp‘>
#None

re.split(pattern,string,splitpart,flag):以pattern分割字符串，最多可以分割splitpart个：

temp = r‘this is DOTA2!‘
print(re.split(‘ ‘, temp, 0, re.I))
print(re.split(‘ ‘, temp, 1, re.I))
#[‘this‘, ‘is‘, ‘DOTA2!‘]
#[‘this‘, ‘is DOTA2!‘]

re.findall(pattern,string,flag):找到符合pattern的所有字符串，并返回列表：

temp =r‘this is dota,and this is DOTA2.‘
print(re.findall(‘dota.‘,temp,re.I))
#[‘dota,‘, ‘DOTA2‘]

re.finditer(pattern,string,flag):返回一个迭代器：

temp =r‘this is dota,and this is DOTA2.‘
ite = re.finditer(‘dota.‘,temp,re.I)
for obj in ite:
    print(obj.group())
#dota,
#DOTA2

re.sub(pattern,replace||function,string,flag):用字符串或者函数返回值代替符合pattern的字符串

temp = r‘this-is-dota!‘
print(re.sub(‘6*‘, ‘ ‘, temp))  # *present zero or more
# t h i s - i s - d o t a !
def show(repl):  # there must be a parameter
    print(‘play‘)
    return ‘+‘
print(re.sub(‘-‘, show, temp))
# play
# play
# this+is+dota!

re.subn():这个函数和上面那个效果一样，只不过返回的是一个tuple。

temp = r‘this-is-dota!‘
print(re.subn(‘-‘,‘*‘,temp))
#tuple
#(‘this*is*dota!‘, 2)

re.escape(string):转义除了数字，字母的其他ASCII字符。

temp = r‘<script>‘
print(re.escape(temp))
# \<script\>

re.purge():释放正则表达式缓存。。。然后我现在并没有发现什么时候会产生正则表达式缓存。。。

　　之后就是经过re.compile()之后的regex对象了

　　变成regex对象之后的方法和直接使用re加载正则匹配模式的方法大体上差不多，只不过多了以下几种方法：

regex.flags　　#得到该正则对象的flags值
regex.pattern#得到该正则对象的样式
regex.groups#得到该正则对象的分组
regex.groupindex#得到该正则表达式对象有别名的值

　　具体如下：

temp =‘this is dota,that is dota2,this is war3!‘
r0=re.compile(‘(dota)(.)(?P<nickname>....)‘,re.I)
print(r0.flags)
print(r0.groups)
print(r0.groupindex)
print(r0.pattern)
print(r0.search(temp).group(‘nickname‘))
print(r0.search(temp).group(2))
# 34
# 3
# {‘nickname‘: 3}
# (dota)(.)(?P<nickname>....)
# that
# ,

　　最后就是match对象了

match.expand():将得到的字符按照相应的模板处理

temp =‘this is dota,that is dota2,this is war3!‘
r0=re.compile(‘(....)(.)(..)‘)
matchobj=r0.search(temp)
print(matchobj)
print(matchobj.expand(r‘\3\2\1‘))
# <_sre.SRE_Match object; span=(0, 7), match=‘this is‘>
# is this

match.group():找匹配结果的分组

temp =‘this*is dota,that is dota2,this is war3!‘
r0=re.compile(‘(....)(?P<second>.)(..)‘)
tem=r0.search(temp)
print(tem.group(0))#the entire match
print(tem.group(1))
print(tem.group(2))
print(tem.group(‘second‘))
print(tem.group(3))
print(tem.group(1,2))# a tuple
# this*is
# this
# *
# *
# is
# (‘this‘, ‘*‘

match.groups()：如果匹配成功得到一个tuple结果
match.groupdict():返回所有有别名的匹配组结果，结果是一个dict

temp =‘this is dota,that is dota2,this is war3!‘
m=re.match(‘(....)(.)(..)‘,temp)
print(m.groups())
print(type(m.groups()))
# (‘this‘, ‘ ‘, ‘is‘)
# <class ‘tuple‘>

temp =‘this is dota,that is dota2,this is war3!‘
m=re.match(‘(....)(?P<word1>.)(?P<word2>..)‘,temp)
print(m.groupdict())#返回所有有别名的组
# {‘word1‘: ‘ ‘, ‘word2‘: ‘is‘}

match.endpos()：
match.pos()：

temp =‘this is dota,that is dota2,this is war3!‘
reg =re.compile(‘dota‘)
m=reg.search(temp,10,30)
print(m.pos)#开始匹配的索引
print(m.endpos)#返回所匹配字符串的最后一位的索引
# 10
# 30

match.lastindex
match.lastgroup
match.re
match.string

temp =‘this is dota,that is dota2,this is war3!‘
m=re.match(‘(....)(?P<Tom>.)(?P<Mary>..)‘,temp)
print(m.lastindex)#返回最后一个匹配组的索引，如果压根没有匹配成功的，为None
print(m.lastgroup)#返回最后一个匹配组的名字，如果没有，则返回None
print(m.re)#返回该正则
print(m.string)#返回匹配字符串
# 3
# Mary
# re.compile(‘(....)(?P<Tom>.)(?P<Mary>..)‘)
# this is dota,that is dota2,this is war3!

match.start(group)：
match.end(group)：得到某个匹配组开始/结束匹配的索引值
match.span(group):得到某个匹配组的跨度。。。

temp =‘this is dota,that is dota2,this is war3!‘
m=re.match(‘(....)(?P<word1>.)(?P<word2>..)‘,temp)
print(m.start(1))
print(m.end(2))
print(m.span(2))
# 0
# 5
# (4, 5)

　　以上

Python3 re模块

标签：

原文地址：http://www.cnblogs.com/zhangxd-stn/p/python.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行