标签:
python中re模块提供了正则表达式相关操作
字符:
. 匹配除换行符以外的任意字符
\w 匹配字母或数字或下划线或汉字
\s 匹配任意的空白符
\d 匹配数字
\b 匹配单词的开始或结束
^ 匹配字符串的开始
$ 匹配字符串的结束
次数:
* 重复零次或更多次
+ 重复一次或更多次
? 重复零次或一次
{n} 重复n次
{n,} 重复n次或更多次
{n,m} 重复n到m次
match
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
# match,从起始位置开始匹配,匹配成功返回一个对象,未匹配成功返回None match(pattern, string, flags = 0 ) # pattern: 正则模型 # string : 要匹配的字符串 # falgs : 匹配模式 X VERBOSE Ignore whitespace and comments for nicer looking RE‘s. I IGNORECASE Perform case - insensitive matching. M MULTILINE "^" matches the beginning of lines (after a newline) as well as the string. "$" matches the end of lines (before a newline) as well as the end of the string. S DOTALL "." matches any character at all , including the newline. A ASCII For string patterns, make \w, \W, \b, \B, \d, \D match the corresponding ASCII character categories (rather than the whole Unicode categories, which is the default). For bytes patterns, this flag is the only available behaviour and needn‘t be specified. L LOCALE Make \w, \W, \b, \B, dependent on the current locale. U UNICODE For compatibility only. Ignored for string patterns (it is the default), and forbidden for bytes patterns. |
search
1
2
|
# search,浏览整个字符串去匹配第一个,未匹配成功返回None # search(pattern, string, flags=0) |
findall
1
2
3
|
# findall,获取非重复的匹配列表;如果有一个组则以列表形式返回,且每一个匹配均是字符串;如果模型中有多个组,则以列表形式返回,且每一个匹配均是元祖; # 空的匹配也会包含在结果中 #findall(pattern, string, flags=0) |
sub
1
2
3
4
5
6
7
8
|
# sub,替换匹配成功的指定位置字符串 sub(pattern, repl, string, count = 0 , flags = 0 ) # pattern: 正则模型 # repl : 要替换的字符串或可执行对象 # string : 要匹配的字符串 # count : 指定匹配个数 # flags : 匹配模式 |
# 与分组无关 origin = "hello alex bcd alex lge alex acd 19" r = re.sub("a\w+", "999", origin, 2) print(r)
split
1
2
3
4
5
6
7
|
# split,根据正则匹配分割字符串 split(pattern, string, maxsplit = 0 , flags = 0 ) # pattern: 正则模型 # string : 要匹配的字符串 # maxsplit:指定分割个数 # flags : 匹配模式 |
标签:
原文地址:http://www.cnblogs.com/cfj271636063/p/5818562.html