码迷,mamicode.com
首页 > 其他好文 > 详细

利用bs4爬取三国演义所有章节标题以及章节内容

时间:2019-09-30 09:50:52      阅读:135      评论:0      收藏:0      [点我收藏+]

标签:http   linu   sel   requests   href   chapter   get   open   app   

  url = ‘ http://www.shicimingju.com/book/sanguoyanyi.html

  

from bs4 import BeautifulSoup
import requests
url = http://www.shicimingju.com/book/sanguoyanyi.html
headers = {
User-Agent: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Mobile Safari/537.36
}

page_text = requests.get(url=url,headers=headers).text
soup = BeautifulSoup(page_text,lxml)
res_list = soup.select(.book-mulu a)
with open(三国演义.text,w,encoding=utf-8)as f:
    for item in res_list:
        url_item = %s%s%("http://www.shicimingju.com",item[href])
        detail_page_text = requests.get(url=url_item, headers=headers).text
        detail_soup =  BeautifulSoup(detail_page_text,lxml)
        title = detail_soup.find(div,class_=www-main-container).text
        body = detail_soup.find("div",class_=chapter_content).text
        f.write(title+\n+body)

 

利用bs4爬取三国演义所有章节标题以及章节内容

标签:http   linu   sel   requests   href   chapter   get   open   app   

原文地址:https://www.cnblogs.com/Jnhnsnow/p/11610821.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!