标签:python div .json pop find 分割 als out roc
re(续):
re默认是贪婪模式。
贪婪模式:在满足匹配时,匹配尽可能长的字符串。
import re s = ‘askldlaksdabccccccccasdabcccalsdacbcccacbcccabccc‘ res = re.findall(‘abc+‘,s) print(res) res = re.findall(‘abc+?‘,s) #在规则后面加?来取消贪婪模式。 print(res) 执行结果: D:\Python\Python36-32\python.exe E:/Python/DAY-15/3213.py [‘abcccccccc‘, ‘abccc‘, ‘abccc‘] [‘abc‘, ‘abc‘, ‘abc‘] Process finished with exit code 0
re的模块的常用方式:
re.split(): 类似字符串的split命令但是比 字符串的split 更强大。
import re s = ‘askldlaksdab8ccccc.cccas8dabc8cc.alsdacbcccac.cccab8ccc‘ res = re.split(‘\d‘,s) print(res) res = re.split(‘(\d+)‘,s) #加()来保留分割符 print(res) 执行结果: D:\Python\Python36-32\python.exe E:/Python/DAY-15/3213.py [‘askldlaksdab‘, ‘ccccc.cccas‘, ‘dabc‘, ‘cc.alsdacbcccac.cccab‘, ‘ccc‘] [‘askldlaksdab‘, ‘8‘, ‘ccccc.cccas‘, ‘8‘, ‘dabc‘, ‘8‘, ‘cc.alsdacbcccac.cccab‘, ‘8‘, ‘ccc‘] Process finished with exit code 0
re.sub():类似replace 替换操作。
import re s = ‘askldlaksdab8ccccc.cccas8dabc8cc.alsdacbcccac.cccab8ccc‘ res = re.sub(‘abc+‘,‘123‘,s) print(res) 执行结果: D:\Python\Python36-32\python.exe E:/Python/DAY-15/3213.py askldlaksdab8ccccc.cccas8d1238cc.alsdacbcccac.cccab8ccc Process finished with exit code 0
re.compile():编译
import re s = ‘askldlaksdab8ccccc.cccas8dabc8cc.alsdacbcccac.cccab8ccc‘ obj = re.compile(‘\d+‘) #定义一个对象对应的编译规则 res = obj.findall(s) #调用处理 print(res) 执行结果: D:\Python\Python36-32\python.exe E:/Python/DAY-15/3213.py [‘8‘, ‘8‘, ‘8‘, ‘8‘] Process finished with exit code 0
一个小爬虫正则练习(爬校花网)
import requests,re,json url = ‘http://www.xiaohuar.com/2014.html‘ #校花排行榜top120 def req(): req_str = requests.get(url) # print(‘encoding‘,req_str.encoding) return req_str.text def run(): html = req() html = html.encode(‘Latin-1‘).decode(‘gbk‘) # print(html) obj = re.compile(‘<div class="top-title">(.*?)</div>.*?<div class="title">.*?target="_blank">(.*?)</a></span></div>‘,re.S) #匹配top排名序号和姓名学校 res = obj.findall(html) return res dic = {} res = run() for x in res: dic[x[0]]=x[1] data = json.dumps(dic) #序列化 with open(‘xiaohua.json‘,‘a‘,encoding=‘utf-8‘) as f: f.write(data) with open(‘xiaohua.json‘, ‘r‘, encoding=‘utf-8‘) as f: data = json.load(f) #反序列化 print(data)
subprocess:
subprocess模块允许一个进程创建一个新的子进程,通过管道连接到子进程的stdin/stdout/stderr,获取子进程的返回值等操作。
import subprocess s = subprocess.Popen(‘dir‘,shell=True,stdout=subprocess.PIPE) print(s.stdout.read().decode(‘gbk‘)) 执行结果: D:\Python\Python36-32\python.exe E:/Python/DAY-15/3213.py 驱动器 E 中的卷没有标签。 卷的序列号是 383D-453A E:\Python\DAY-15 的目录 2017/06/27 19:52 <DIR> . 2017/06/27 19:52 <DIR> .. 2017/06/27 19:52 338 3213.py 2017/06/27 19:47 778 tmp.py 2017/06/27 19:25 9,146 xiaohua.json 3 个文件 10,262 字节 2 个目录 117,877,260,288 可用字节 Process finished with exit code 0
Python基础day-13[模块:re,subprocess未完]
标签:python div .json pop find 分割 als out roc
原文地址:http://www.cnblogs.com/ldsly/p/7086882.html