笔趣阁小说-雪中悍刀行-爬虫源代码

时间：2019-11-05 21:48:10 阅读：115 评论：0 收藏：0 [点我收藏+]

标签：返回 replace 锤子 open requests imp enc 完成 code

 1 import re
 2 import requests
 3 from bs4 import BeautifulSoup
 4 
 5 url = ‘http://www.biquge6.com/11_11147/‘
 6 r = requests.get(url)
 7 b = BeautifulSoup(r.content.decode(‘gbk‘))
 8 h = b.find_all(href = re.compile(‘/11_11147/‘))       #正则匹配属性值带有/104_104216/的href标签，并返回正则模式对象h
 9 
10 list_len = len(h)      #剔除掉最新12章节
11 print(‘开始下载：‘)
12 i = 1
13 for each in h:
14     print(‘正在下载第‘ + str(i) + ‘章，共‘ + str(list_len) + ‘章‘)
15     url1 = url + each.get(‘href‘)[10:]                  #,获取其中一个超链接地址第12位后的链接地址
16     re = requests.get(url1)                             #每章节完整链接地址
17     bs = BeautifulSoup(re.content.decode(‘gbk‘))        #获取章节数据
18     t = bs.find_all(‘h1‘)[0].text[1:]            #find_all获取章节文章标题，[0].text[1:]截取标题内容
19 
20     content = bs.find_all(id = ‘content‘)[0].text         # 数据清洗，清除html的多余标签
21     content = content.replace(‘\xa0‘*8,‘ ‘).replace(‘    ‘, ‘‘).replace(‘\n\r‘, ‘\n‘)
22     content = t + ‘\n\n‘ + content +‘\n\n\n\n‘          # 将标题和内容整合
23     with open(‘雪中悍刀行.doc‘, ‘a‘, encoding=‘utf-8‘) as f:
24         f.write(content)
25     i+=1
26 print(‘下载完成!‘)

----左手举个栗子，右手举个锤子----

笔趣阁小说-雪中悍刀行-爬虫源代码

标签：返回 replace 锤子 open requests imp enc 完成 code

原文地址：https://www.cnblogs.com/Luoters/p/11801539.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行