爬虫-爬取豆瓣图书TOP250

时间：2018-04-18 01:02:17 阅读：385 评论：0 收藏：0 [点我收藏+]

标签：top tin com tle ref lis style str from

import requests
from bs4 import BeautifulSoup

def get_book(url):
    wb_data = requests.get(url)
    soup = BeautifulSoup(wb_data.text,‘lxml‘)
    title_list = soup.select(‘h1 > span‘)
    title = title_list[0].text
    author_list = soup.select(‘div#info > a‘)
    author = author_list[0].text.replace(" ", "").replace("\n", "")
    score_list = soup.select(‘strong.ll.rating_num‘)
    score = score_list[0].text

    data = {
        ‘title‘:title,
        ‘score‘:score,
        ‘author‘:author,
    }

    print(data)


def get_all_book():
    for i in range(0,250,25):
        url = ‘https://book.douban.com/top250?start=‘ + str(i)
        wb_data = requests.get(url)
        soup = BeautifulSoup(wb_data.text,‘lxml‘)
        href_list = soup.select(‘div.pl2 > a‘)
        for href in href_list:
            link = href.get(‘href‘)
            get_book(link)

get_all_book()

爬虫-爬取豆瓣图书TOP250

标签：top tin com tle ref lis style str from

原文地址：https://www.cnblogs.com/hiss/p/8870792.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行