标签:dump http poi lis rop 写入 div 爬虫 find
re.findall("规则",解析的数据,匹配的模式(re.S))
.*? ---->过滤
(.*?)=-->提取内容
json - --->第三方的数据格式
json.dump()
json.loads() -->json数据格式 - -转化为python数据
import requests
import re
# 0 获取所有电影的url
num = 0
for line in range(10):
url = f'https://movie.douban.com/top250?start={num}&filter=' # 0 25 50 75
num += 25
# 1.发送请求
response = requests.get(
url=url
)
# 获取电影的名称与详情页地址
# movie_name = re.findall('<div class="item">.*?<a href="(.*?)">.*?<span class="title">(.*?)</span>', response.text, re.S)
movie_list = re.findall(
'<div class="item">.*?<a href="(.*?)">.*?<span class="title">(.*?)</span>.*?<span class="rating_num" property="v:average">(.*?)</span>.*?<span>(.*?)人评价</span>',
response.text, re.S)
# 循环
num = 1
with open('douban.txt', 'a', encoding='utf-8') as f:
for line in movie_list:
movie_url = line[0]
movie_name = line[1]
movie_point = line[2]
movie_count = line[3]
f.write(movie_url + '---' + movie_name + '---' + movie_point + '---' + movie_count + '\n')
print('写入数据成功,爬虫程序结束...')
标签:dump http poi lis rop 写入 div 爬虫 find
原文地址:https://www.cnblogs.com/shaozheng/p/11426076.html