Python爬虫学习==>第九章：正则表达式基础

时间：2018-04-07 20:03:10 阅读：200 评论：0 收藏：0 [点我收藏+]

学习目的：

　　正则表达式是对字符串操作的一种逻辑公式，就是用事先定义好的一些特点字符、及这些特点字符组合，组成一个“规则字符串”，这个“规则字符串”用来表达对字符串的一种过滤逻辑。

正式步骤

Step1：常用匹配模式

技术分享图片

Step2：最常规的匹配

import re

testString = ‘I have 4Learned the python years‘
print(len(testString))
result = re.match(‘^I\s\w{4}\s\d\w{7}.*years$‘,testString)
print(result)
print(result.group()) #现实匹配结果
print(result.span()　　#现实匹配区间

运行结果：

32
<_sre.SRE_Match object; span=(0, 32), match=‘I have 4Learned the python years‘>
I have 4Learned the python years
(0, 32)

范匹配：

.*可以把除了匹配的开头和结尾都匹配

import re
testString = ‘I have 4Learned the python years‘
print(len(testString))
result = re.match(‘^I.*years$‘,testString)
print(result)
print(result.group())
print(result.span())

匹配目标：

　　设置起始端点后，用()来把需要匹配的目标括号起来

import re

testString = ‘I have Learned the python years‘
print(len(testString))
result = re.match(‘^I\s\w{4}\s(\w+)\s.*years$‘,testString)
print(result)
print(result.group(1))
print(result.span())

贪婪匹配：

import re
testString = ‘I have 7777 Learned the python years‘
print(len(testString))
result = re.match(‘^I.*(\d+).*years$‘,testString)
print(result)
print(result.group(1))
print(result.span())

运行结果：

36
<_sre.SRE_Match object; span=(0, 36), match=‘I have 7777 Learned the python years‘>
7
(0, 36)

非贪婪匹配

import re
testString = ‘I have 7777 Learned the python years‘
print(len(testString))
result = re.match(‘^I.*?(\d+).*years$‘,testString)
print(result)
print(result.group(1))
print(result.span())

运行结果：

36
<_sre.SRE_Match object; span=(0, 36), match=‘I have 7777 Learned the python years‘>
7777
(0, 36)

Step3：匹配模式

包含换行符：

import re
testString = ‘‘‘I have 7777
Learned the python years‘‘‘
print(len(testString))
result = re.match(‘^I.*(\d+).*years$‘,testString,re.S)
print(result)
print(result.group(1))
print(result.span())

转义：

import re
content = "i have $5.00"
result = re.match(‘i have \$5\.00‘,content)
print(result.group())

Step4: re.search

　　功能：扫描整个字符串，返回第一个成功的匹配

import re
testString = ‘‘‘I have 7777
Learned the python years‘‘‘
print(len(testString))
result = re.search(‘I.*(\d+).*years$‘,testString,re.S)
print(result)
print(result.group(1))
print(result.span())

　　总结：为了匹配方便，能用search就不用match，因为search方法不用限制匹配字符串的头部必须一致

学习总结：

　　正则表达式的应用需要多实践

Python爬虫学习==>第九章：正则表达式基础

标签：ima 目标 test 特点正式匹配 ring log content

原文地址：https://www.cnblogs.com/wuzhiming/p/8711852.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行