码迷,mamicode.com
首页 > 编程语言 > 详细

python正则表达式

时间:2016-05-13 13:33:57      阅读:222      评论:0      收藏:0      [点我收藏+]

标签:

1.查找文本中的模式:re.search()

import re

text = "you are python"
match = re.search(are,text)

# 返回‘are‘第一个字符在text中的位置,注意是从0开始
print(match.start())

# 返回‘are‘最后一个字符在text中的位置,这里是从1开始
print(match.end())

结果:

4
7

 

2.编译表达式:compile()

import re

text = "you are SB"
# compile()函数会把一个表达式字符串转化为一个RegexObject
regexes
= {re.compile(p) for p in [SB,NB,MB] } for regex in regexes: if regex.search(text): print(seeking + regex.pattern + >> match) else: print(seeking + regex.pattern + >> not match)

结果:

seeking NB >> not match
seeking SB >> match
seeking MB >> not match

 

3.多重匹配:findall(), finditer()

findall()函数会返回与模式匹配而不重叠的所有字符串。

import re

text = "ab aa ab bb ab ba"

# 把text所有的‘ab‘组合都匹配出来 print(re.findall(ab,text))

结果:

[ab, ab, ab]

 

finditer()会返回一个迭代器

import re

text = "ab aa ab bb ab ba"

for match in re.finditer(ab,text):
    s = match.start()
    e = match.end()
    print(found {} at {} to {}.format("ab",str(s),str(e)))

结果:

found "ab" at 0 to 2
found "ab" at 6 to 8
found "ab" at 12 to 14

 

4.模式语法

import re

def test_patterns(text,patterns = []):
    # pattern分别返回patterns各项的的第一个元素,desc分别返回patterns各项的的第二个元素
    # 本函数相当于把patterns第一项作为匹配条件来匹配text
    for pattern,desc in patterns:
        print(匹配方式: + pattern, 说明: + desc)
        print(文本: + text)
        for match in re.finditer(pattern,text):
            s = match.start()
            e = match.end()
            print(found {} at {} to {}, result:{}.format(pattern,str(s),str(e),(text[s:e])))

 

执行该函数:

# 匹配所有a开头,后面是1个b的

>>> test_patterns(a ab abb abbb ,[(ab,a followed by b)])
匹配方式:ab 说明:a followed by b 文本:a ab abb abbb found ab at
2 to 4, result:ab found ab at 5 to 7, result:ab found ab at 9 to 11, result:ab >>>

 

重复

模式中有5中表示重复的方法,现在一一展示:

# * 表示该模式会重复0次或多次(重复0次即意味着它不出现也能被匹配)
# 匹配a开头,后面为0个或多个b,也就是只要出现a就能匹配

>>> test_patterns(a ac ab abb abbb ,[(ab*,a followed by zero or more b)])

匹配方式:ab* 说明:a followed by zero or more b
文本:a ac ab abb abbb 
found ab* at 0 to 1, result:a
found ab* at 2 to 3, result:a
found ab* at 5 to 7, result:ab
found ab* at 8 to 11, result:abb
found ab* at 12 to 16, result:abbb
>>>

 

# + 表示该模式至少出现一次
# 匹配a开头,后面至少有1个b的

>>> test_patterns(a ab abb abbb ,[(ab+,a followed by one or more b)]) 匹配方式:ab+ 说明:a followed by one or more b 文本:a ab abb abbb found ab+ at 2 to 4, result:ab found ab+ at 5 to 8, result:abb found ab+ at 9 to 13, result:abbb >>>

 

# ? 表示模式出现0次或1次
# 匹配a开头,后面为0个或1个b,也就是只要出现a就能匹配

>>> test_patterns(a ab abb abbb ,[(ab+,a followed by one or more b)])

匹配方式:ab+ 说明:a followed by one or more b
文本:a ab abb abbb 
found ab+ at 2 to 4, result:ab
found ab+ at 5 to 8, result:abb
found ab+ at 9 to 13, result:abbb
>>>

 

# 匹配a开头,后面3个b

>>> test_patterns(a ab abb abbb ,[(ab{3},a followed by three b)])
匹配方式:ab{
3} 说明:a followed by three b 文本:a ab abb abbb found ab{3} at 9 to 13, result:abbb >>>

 

# 匹配a开头,后面2个或3个b

>>> test_patterns(a ab abb abbb ,[(ab{2,3},a followed by two or three b)])

匹配方式:ab{2,3} 说明:a followed by two or three b
文本:a ab abb abbb 
found ab{2,3} at 5 to 8, result:abb
found ab{2,3} at 9 to 13, result:abbb
>>>

 

字符集

字符集是一组字符,包含可以与模式中相应位置匹配的所有字符,例如[ab]可以匹配a或b。

# 匹配所有a或b

>>> test_patterns(abca,[([ab],either a or b)])

匹配方式:[ab] 说明:either a or b
文本:abca
found [ab] at 0 to 1, result:a
found [ab] at 1 to 2, result:b
found [ab] at 3 to 4, result:a
>>>

 

# 匹配a开头,后面是a或b的

>>> test_patterns(a aa ab ac,[(a[ab],a followed by a or b)])

匹配方式:a[ab] 说明:a followed by a or b
文本:a aa ab ac
found a[ab] at 2 to 4, result:aa
found a[ab] at 5 to 7, result:ab
>>>

 

# 匹配a开头,后面是1个或多个a或b的

>>> test_patterns(a aaaaa abbb accc,[(a[ab]+,a followed by 1 or more a or b)])

匹配方式:a[ab]+ 说明:a followed by 1 or more a or b
文本:a aaaaa abbb accc
found a[ab]+ at 2 to 7, result:aaaaa
found a[ab]+ at 8 to 12, result:abbb
>>>

 

python正则表达式

标签:

原文地址:http://www.cnblogs.com/singeldiego/p/5487442.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!