Python Re模块

时间：2015-01-27 20:13:04 阅读：187 评论：0 收藏：0 [点我收藏+]

标签：

0x01 常用元字符和特殊符号

 1 .    匹配除换行符以外的任意字符，除开换行
 2 #f.o    可以匹配fao, foo等
 3 
 4 ^    匹配字符串的开始
 5 #^From   匹配任何以From开始的字符串
 6 
 7 $    匹配字符串的结束
 8 #/bin/tcsh$    匹配任何以/bin/tcsh结束的字符串
 9 #^/bin/tcsh$   匹配仅由/bin/tcsh组成的字符串
10 
11 []    用来匹配一个指定的字符类型
12 #b[aeiu]t   匹配bat, bet, bit, but
13 
14 [^...]    不匹配字符集中出现的任何一个字符
15 ?    对前一个字符重复0到1次
16 *    对前一个字符重复0到多次
17 {m}    对前一个字符重复m次
18 {m, n}    对前一个字符重复m到n次
19 ()    匹配括号中的正则表达式，并保存为子组
20 \d    匹配数字 
21 \D    匹配任何非数字
22 \s    匹配任意的空白符
23 \S    匹配任意的非空白符
24 \w    匹配任意字母数字字符
25 \W    匹配任意非字母数字字符
26 \b    匹配单词的开始或结束(边界)
27 \nn    匹配已保存的子组
28 \c    逐一匹配特殊字符c(即取消它的特殊含义，按字面匹配)
29 \A(\Z)    匹配字符串的起始(结束)

0x02 使用闭包操作符(*, +, ?, {} )实现多次出现/重复匹配

1 [dn]ot?        #do, no, dot, not
2 0?[1-9]        #1~9中任意一个数字，前面可能还有一个0
3 [0-9]{15,16}     #匹配15-16位数字
4 </?[^>]+>        #匹配任何合法的HTML标签
5 \w+-\d+        #一个由字母或数字组成的字符串和至少一个数字，中间用-连接
6 ......

0x03 常用方法

1.compile(pattern, flags=0)　　对正则表达式模式pattern进行编译，flags是可选标识符，并返回一个regex对象

1 >>> import re
2 >>> test = ‘hello world‘
3 >>> r = re.compile(r‘\w*e\w*‘)    #匹配含有e的单词
4 >>> r.findall(test)
5 [‘hello‘]

2.match(pattern, string, flags=0)　　尝试用正则表达式模式pattern匹配字符串string，flags是可选标识符，如果匹配成功，则返回一个匹配对象，否则返回None

1 >>> r = re.match(r‘\w+@\w+\.com‘,‘example@gmail.com‘)
2 >>> print r.group(0)
3 example@gmail.com

3.search(pattern, string, flags=0)　　在字符串string中搜索正则表达式模式pattern的第一次出现，flags是可选标识符，如果匹配成功，则返回第一个匹配对象，否则返回None

re.match()和re.search()的用法是一样的，唯一的区别在于re.match()值匹配字符串的开头，如果在开头没有匹配到对应的字符串，那么就会返回None。而re.search()则会在字符串中进行搜索，直到找到一个匹配的字符串，并返回。

4.findall(pattern, string [, flags])　　在字符串string中搜索正则表达式模式pattern的所有(非重复)出现，返回一个匹配对象的列表

1 >>> re.findall(r‘\w*o\w*‘, ‘hot hello foo foo‘)
2 [‘hot‘, ‘hello‘, ‘foo‘, ‘foo‘]

5.sub(pattern, repl, string, count=0, flags=0)　　把字符串string中所有匹配正则表达式pattern的地方替换成字符串repl，如果count的值没有给出，则对所有匹配的地方进行替换

1 >>> re.sub(r‘X‘, ‘Mr.Smith‘, ‘attn: X, Dear X,‘)
2 ‘attn: Mr.Smith, Dear Mr.Smith,‘
3 >>> re.sub(r‘X‘, ‘Mr.Smith‘, ‘attn: X, Dear X,‘, count=1) 
4 ‘attn: Mr.Smith, Dear X,‘
5 >>>

6.split(pattern, string, max=0) 　　根据正则表达式pattern中的分隔符把字符string分割成一个列表，返回成功匹配的列表，最多分割max次，默认是分割所有匹配的地方

1 >>> re.split(r‘:‘, ‘str1:str2:str3‘)
2 [‘str1‘, ‘str2‘, ‘str3‘]

用来分割who命令输出的结果

xt@linux-t2lc:~> who > who.txt
xt      :0           2015-01-25 09:34 (:0)
xt      pts/1        2015-01-26 13:41 (:0)
xt      pts/2        2015-01-27 15:05 (:0)
xt      pts/3        2015-01-27 15:05 (:0)
xt      pts/4        2015-01-27 15:05 (:0)

#who.py
#!/usr/bin/env python
# coding=utf-8
import re
f = open(‘who.txt‘, ‘r‘)
for eachLine in f.readlines():
    print re.split(‘\s\s+|\t‘, eachLine.strip())
f.close()e

#output
[‘xt‘, ‘:0‘, ‘2015-01-25 09:34 (:0)‘]
[‘xt‘, ‘pts/1‘, ‘2015-01-26 13:41 (:0)‘]
[‘xt‘, ‘pts/2‘, ‘2015-01-27 15:06 (:0)‘]
[‘xt‘, ‘pts/3‘, ‘2015-01-27 15:06 (:0)‘]
[‘xt‘, ‘pts/4‘, ‘2015-01-27 15:06 (:0)‘]

Python Re模块

标签：

原文地址：http://www.cnblogs.com/hackxt/p/4252808.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行