My way to Python - Day04 - 模块

时间：2015-12-05 00:24:57 阅读：282 评论：0 收藏：0 [点我收藏+]

标签：

re模块

什么是正则表达式

正则表达式，英文叫做Regular Expression。简单说，正则表达式就是一组规则，用于实现字符串的查找，匹配，以实现关于字符串的相关操作，比如替换，删除等。

正则表达式分为基本正则表达式（BRE），扩展正则表达式（ERE）两类。

什么是Python的正则表达式

Python中的正则表达式模块就是re模块，在Python中可以通过re模块来做正则表达式的操作。

Python中re模块的常用方法有：match，search，findall，sub，replace，split等

常用的匹配规则有：

##字符匹配：
.     #匹配除换行符以外的任意字符
\w    #匹配字母或数字或下划线或汉字
\s    #匹配任意的空白符
\d    #匹配数字
\b    #匹配单词的开始或结束
^     #匹配字符串的开始
$     #匹配字符串的结束

##次数匹配：
*     #重复零次或更多次
+     #重复一次或更多次
?     #重复零次或一次
{n}   #重复n次
{n,}  #重复n次或更多次
{n,m} #重复n到m次

下面将通过实例对re模块的常用方法做说明

1、match(pattern, string, flags=0)

从起始位置开始根据模型去字符串中匹配指定内容，匹配单个

正则表达式
要匹配的字符串
标志位，用于控制正则表达式的匹配方式

import re

obj = re.match(‘\d+‘, ‘123uuasf‘)
if obj:
    print obj.group()

# flags
I = IGNORECASE = sre_compile.SRE_FLAG_IGNORECASE # ignore case
L = LOCALE = sre_compile.SRE_FLAG_LOCALE # assume current 8-bit locale
U = UNICODE = sre_compile.SRE_FLAG_UNICODE # assume unicode locale
M = MULTILINE = sre_compile.SRE_FLAG_MULTILINE # make anchors look for newline
S = DOTALL = sre_compile.SRE_FLAG_DOTALL # make dot match newline
X = VERBOSE = sre_compile.SRE_FLAG_VERBOSE # ignore whitespace and comments

2、search(pattern, string, flags=0)

根据模型去字符串中匹配指定内容，匹配单个

import re

obj = re.search(‘\d+‘, ‘u123uu888asf‘)
if obj:
    print obj.group()

3、group和groups

a = "123abc456"
print re.search("([0-9]*)([a-z]*)([0-9]*)", a).group()

print re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(0)
print re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(1)
print re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(2)

print re.search("([0-9]*)([a-z]*)([0-9]*)", a).groups()

4、findall(pattern, string, flags=0)

上述两中方式均用于匹配单值，即：只能匹配字符串中的一个，如果想要匹配到字符串中所有符合条件的元素，则需要使用 findall。

import re

obj = re.findall(‘\d+‘, ‘fa123uu888asf‘)
print obj

5、sub(pattern, repl, string, count=0, flags=0)

用于替换匹配的字符串

content = "123abc456"
new_content = re.sub(‘\d+‘, ‘sb‘, content)
# new_content = re.sub(‘\d+‘, ‘sb‘, content, 1)
print new_content

相比于str.replace功能更加强大

6、split(pattern, string, maxsplit=0, flags=0)

根据指定匹配进行分组

content = "‘1 - 2 * ((60-30+1*(9-2*5/3+7/3*99/4*2998+10*568/14))-(-4*3)/(16-3*2) )‘"
new_content = re.split(‘\*‘, content)
# new_content = re.split(‘\*‘, content, 1)
print new_content

content = "‘1 - 2 * ((60-30+1*(9-2*5/3+7/3*99/4*2998+10*568/14))-(-4*3)/(16-3*2) )‘"
new_content = re.split(‘[\+\-\*\/]+‘, content)
# new_content = re.split(‘\*‘, content, 1)
print new_content

inpp = ‘1-2*((60-30 +(-40-5)*(9-2*5/3 + 7 /3*99/4*2998 +10 * 568/14 )) - (-4*3)/ (16-3*2))‘
inpp = re.sub(‘\s*‘,‘‘,inpp)
new_content = re.split(‘\(([\+\-\*\/]?\d+[\+\-\*\/]?\d+){1}\)‘, inpp, 1)
print new_content

相比于str.split更加强大

匹配IP地址：

^(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}$

匹配手机号：

^1[3|4|5|8][0-9]\d{8}$

参考链接：

http://www.crifan.com/files/doc/docbook/python_topic_re/release/html/python_topic_re.html

http://www.cnblogs.com/wupeiqi/articles/4963027.html

My way to Python - Day04 - 模块

标签：

原文地址：http://www.cnblogs.com/thatsit/p/5020841.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行