正则表达式,可以取数据
正则有匹配贪婪性,照多了匹配
>>> import re
>>> s="I am 19 years old"
>>> re.search(r"\d+",s)
<_sre.SRE_Match object at 0x0000000002F63ED0>
>>> re.search(r"\d+",s).group()
‘19‘
>>>
>>> s="I am 19 years 20 old 30"
>>> re.findall(r"\d",s)
[‘1‘, ‘9‘, ‘2‘, ‘0‘, ‘3‘, ‘0‘]
>>> re.findall(r"\d+",s)
[‘19‘, ‘20‘, ‘30‘]
可以匹配数据,取数据,
判断句子里是否包含指定字符串
re.match(r"\d+","123abc")里面的r最好带上,以防止转义字符影响
r”\d”匹配数字
>>> re.match(r"\d+","123abc")
<_sre.SRE_Match object at 0x0000000002F63ED0>
>>> re.match("\d+","123abc")
<_sre.SRE_Match object at 0x000000000306B030>
>>> re.match("\d+","a123abc")
>>> re.match("\d+","a123abc")
>>> re.match("\d+","123abc").group()
‘123‘
>>> re.match("\d+","123abc 1sd").group()
‘123‘
Re.search(r“\d+”)任意位置符合就返回
Re.findall(r“\d+”),也是任何位置匹配就返回
Re.match(r“\d+”),从字符串第一个位置开始匹配
>>> re.match("\d+","123abc 1sd").group()
‘123‘
>>> re.search("\d+","123abc 1sd").group()
‘123‘
>>> re.search("\d+","a123abc 1sd").group()
‘123‘
>>> re.match("\d+","a123abc 1sd").group()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: ‘NoneType‘ object has no attribute ‘group‘
>>> re.match("\d+","a123abc 1sd")
>>> re.findall(r"\d+","a1b2c3")
[‘1‘, ‘2‘, ‘3‘]
>>> re.match(r"\d+","a1b2c3")
r”\D”匹配非数字
re.match(r"\D+","a1b2c3")匹配非数字
>>> re.match(r"\D+","a1b2c3").group()
‘a‘
>>> re.match(r"\D+","abc1b2c3").group()
‘abc‘
re.match(r"\D+\d+","ab12 bb").group()匹配”ab12 bb”,匹配ab12
>>> re.match(r"\D+\d+","ab12 bb").group()
‘ab12‘
r”\s”匹配空白
re.match(r"\s+","s")匹配空白,match函数从字符串的第一个字符就开始匹配
>>> re.match(r"\s+","ab12 bb").group()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: ‘NoneType‘ object has no attribute ‘group‘
>>> re.match(r"\s+"," bb").group()
r”\S”匹配非空白
re.match(r"\S+","eee bb")非空白,直到出现空白式
>>> re.match(r"\S+","eee bb").group()
‘ee
>>> print ‘a‘#str方法
a
>>> ‘a‘#repr方法
‘a‘
r"\w+"匹配数字和字母
re.findall(r"\w+"," sdf")数字和字母
>>> re.findall(r"\w+"," sdf")
[‘sdf‘]
>>> re.findall(r"\w+"," 12 sdf")
[‘12‘, ‘sdf‘]
>>> re.findall(r"\w+"," 12 sdf we")
[‘12‘, ‘sdf‘, ‘we‘]
r"\W+"匹配非数字和非字母
re.match(r"\W+"," sf fd").group()非数字和非字母
>>> re.match(r"\W+"," sf fd").group()
‘ ‘
>>> re.match(r"\W+"," sf $% @# fd").group()
‘ ‘
>>> re.match(r"\W+","$#% sf $% @# fd").group()
‘$#% ‘
量词
R”\w\w”取两个
>>> re.match(r"\w\w","ww we").group()
‘ww‘
>>> re.match(r"\w\w","12 we").group()
‘12‘
r"\w{2}"取两个
>>> re.match(r"\w{2}","12 we").group()
‘12‘
>>> re.match(r"\w{2}","123 we").group()
‘12‘
>>> re.match(r"\w{2}","1 23 we").group()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: ‘NoneType‘ object has no attribute ‘group‘
>>> re.match(r"\w{2,4}","1 23 we").group()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: ‘NoneType‘ object has no attribute ‘group‘
r"\w{2,4}",取2到4个,按最多的匹配
>>> re.match(r"\w{2,4}","123 we").group()
‘123‘
>>> re.match(r"\w{2,4}","12 we").group()
‘12‘
>>> re.match(r"\w{2,4}","123 we").group()
‘123‘
>>> re.match(r"\w{2,4}","1235 we").group()
‘1235‘
>>> re.match(r"\w{2,4}","12435 we").group()
‘1243‘
>>> re.match(r"\w{5}","12435 we").group()
‘12435‘
>>> re.match(r"\w{5}","srsdf we").group()
‘srsdf‘
>>>
r"\w{2,4}?”抑制贪婪性,按最少的匹配
>>> re.match(r"\w{2,4}?","12435 we").group()
‘12‘
r"\w?"匹配0次和一次,如果没有匹配的,返回空
0次也是匹配了,
>>> re.match(r"\w?","12435 we").group()
‘1‘
>>> re.match(r"\w?"," 12435 we").group()
‘‘
re.findall(r”\w?”,”12 we”)匹配到最后,没有东西,0次,返回空
>>> re.findall(r"\w?"," 12 we")
[‘‘, ‘1‘, ‘2‘, ‘‘, ‘w‘, ‘e‘, ‘‘]
re.findall(r"\w"," 12 we")匹配一个
>>> re.findall(r"\w"," 12 we")匹配一个
[‘1‘, ‘2‘, ‘w‘, ‘e‘]
r"\w*"匹配0次或多次
>>> re.match(r"\w*"," 12435 we").group()
‘‘
>>> re.match(r"\w*","12435 we").group()
‘12435‘
>>> re.match(r"\w*"," 12435 we")
<_sre.SRE_Match object at 0x0000000002F63ED0>
>>> re.match(r"\w*","12435 we")
<_sre.SRE_Match object at 0x000000000306C030>
>>> re.match(r"\w*","12435 we").group()
‘12435‘
r"a.b"匹配ab间除回车以外的任意字符
如果想匹配”a.b”,用r"a\.b"
>>> re.match(r"a.b","axb")
<_sre.SRE_Match object at 0x0000000001DCC510>
>>> re.match(r"a.b","a\nb")
>>> re.match(r"a\.b","a.b")
<_sre.SRE_Match object at 0x00000000022BE4A8>
>>> re.match(r"a\.b","axb")
>>> re.match(r".","axb").group()
‘a‘
>>> re.match(r"a.b","axb").group()
‘axb‘
练习匹配这个ip地址,规则就是有3个.和4段数,数字从0到255之间,即可
s="i find a ip:1.2.22.123 ! yes"
>>> re.search(r"\d{1,3}\.\d{1,3}\.\d{1,3}.\d{1,3}",s).group()
‘1.2.22.123‘
re.search(r"(\d{1,3}\.){3}\d{1,3}",s)分组匹配
>>> re.search(r"(\d{1,3}\.){3}\d{1,3}",s).group()
‘1.2.22.123‘
>>> re.search(r"(\d{1,3}\.){3}\d{1,3}",s).group()
‘1.2.22.123
匹配多个字符的相关格式
字符 功能
* 匹配前?个字符出现0次或者?限次, 即可有可?
+ 匹配前?个字符出现1次或者?限次, 即?少有1次
? 匹配前?个字符出现1次或者0次, 即要么有1次, 要么没有
{m} 匹配前?个字符出现m次
{m,} 匹配前?个字符?少出现m次
{m,n} 匹配前?个字符出现从m到n次
字符 功能
. 匹配任意1个字符(除了\n)
[ ] 匹配[ ]中列举的字符
\d 匹配数字, 即0-9
\D 匹配?数字, 即不是数字
\s 匹配空?, 即 空格, tab键
\S 匹配?空?
\w 匹配单词字符, 即a-z、 A-Z、 0-9、 _
\W 匹配?单词字符