Python基础day-13[模块：re,subprocess未完]

时间：2017-06-27 20:52:56 阅读：222 评论：0 收藏：0 [点我收藏+]

标签：python div .json pop find 分割 als out roc

re(续):

　　re默认是贪婪模式。

　　贪婪模式:在满足匹配时,匹配尽可能长的字符串。

import re
s = ‘askldlaksdabccccccccasdabcccalsdacbcccacbcccabccc‘

res = re.findall(‘abc+‘,s)
print(res)

res = re.findall(‘abc+?‘,s)    #在规则后面加?来取消贪婪模式。
print(res)

执行结果:
D:\Python\Python36-32\python.exe E:/Python/DAY-15/3213.py
[‘abcccccccc‘, ‘abccc‘, ‘abccc‘]
[‘abc‘, ‘abc‘, ‘abc‘]

Process finished with exit code 0

re的模块的常用方式:

re.split(): 类似字符串的split命令但是比字符串的split 更强大。

import re
s = ‘askldlaksdab8ccccc.cccas8dabc8cc.alsdacbcccac.cccab8ccc‘

res = re.split(‘\d‘,s)
print(res)
res = re.split(‘(\d+)‘,s)    #加()来保留分割符
print(res)


执行结果:
D:\Python\Python36-32\python.exe E:/Python/DAY-15/3213.py
[‘askldlaksdab‘, ‘ccccc.cccas‘, ‘dabc‘, ‘cc.alsdacbcccac.cccab‘, ‘ccc‘]
[‘askldlaksdab‘, ‘8‘, ‘ccccc.cccas‘, ‘8‘, ‘dabc‘, ‘8‘, ‘cc.alsdacbcccac.cccab‘, ‘8‘, ‘ccc‘]

Process finished with exit code 0

re.sub():类似replace 替换操作。

import re
s = ‘askldlaksdab8ccccc.cccas8dabc8cc.alsdacbcccac.cccab8ccc‘

res = re.sub(‘abc+‘,‘123‘,s)
print(res)


执行结果:
D:\Python\Python36-32\python.exe E:/Python/DAY-15/3213.py
askldlaksdab8ccccc.cccas8d1238cc.alsdacbcccac.cccab8ccc

Process finished with exit code 0

re.compile():编译

import re
s = ‘askldlaksdab8ccccc.cccas8dabc8cc.alsdacbcccac.cccab8ccc‘

obj = re.compile(‘\d+‘)   #定义一个对象对应的编译规则
res = obj.findall(s)    #调用处理
print(res)

执行结果:
D:\Python\Python36-32\python.exe E:/Python/DAY-15/3213.py
[‘8‘, ‘8‘, ‘8‘, ‘8‘]

Process finished with exit code 0

一个小爬虫正则练习(爬校花网)

import requests,re,json
url = ‘http://www.xiaohuar.com/2014.html‘    #校花排行榜top120
def req():
    req_str = requests.get(url)
    # print(‘encoding‘,req_str.encoding)
    return req_str.text

def run():
    html = req()
    html = html.encode(‘Latin-1‘).decode(‘gbk‘)
    # print(html)
    obj = re.compile(‘<div class="top-title">(.*?)</div>.*?<div class="title">.*?target="_blank">(.*?)</a></span></div>‘,re.S)   #匹配top排名序号和姓名学校
    res = obj.findall(html)
    return res

dic = {}
res = run()
for x in res:
    dic[x[0]]=x[1]
data = json.dumps(dic)       #序列化
with open(‘xiaohua.json‘,‘a‘,encoding=‘utf-8‘) as f:
    f.write(data)

with open(‘xiaohua.json‘, ‘r‘, encoding=‘utf-8‘) as f:
    data = json.load(f)   #反序列化
    print(data)

subprocess:

　　 subprocess模块允许一个进程创建一个新的子进程，通过管道连接到子进程的stdin/stdout/stderr，获取子进程的返回值等操作。

import subprocess

s = subprocess.Popen(‘dir‘,shell=True,stdout=subprocess.PIPE)
print(s.stdout.read().decode(‘gbk‘))

执行结果:
D:\Python\Python36-32\python.exe E:/Python/DAY-15/3213.py
 驱动器 E 中的卷没有标签。
 卷的序列号是 383D-453A

 E:\Python\DAY-15 的目录

2017/06/27  19:52    <DIR>          .
2017/06/27  19:52    <DIR>          ..
2017/06/27  19:52               338 3213.py
2017/06/27  19:47               778 tmp.py
2017/06/27  19:25             9,146 xiaohua.json
               3 个文件         10,262 字节
               2 个目录 117,877,260,288 可用字节


Process finished with exit code 0

Python基础day-13[模块：re,subprocess未完]

标签：python div .json pop find 分割 als out roc

原文地址：http://www.cnblogs.com/ldsly/p/7086882.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行