码迷,mamicode.com
首页 > 其他好文 > 详细

爬虫-爬取豆瓣图书TOP250

时间:2018-04-18 01:02:17      阅读:385      评论:0      收藏:0      [点我收藏+]

标签:top   tin   com   tle   ref   lis   style   str   from   

import requests
from bs4 import BeautifulSoup

def get_book(url):
    wb_data = requests.get(url)
    soup = BeautifulSoup(wb_data.text,lxml)
    title_list = soup.select(h1 > span)
    title = title_list[0].text
    author_list = soup.select(div#info > a)
    author = author_list[0].text.replace(" ", "").replace("\n", "")
    score_list = soup.select(strong.ll.rating_num)
    score = score_list[0].text

    data = {
        title:title,
        score:score,
        author:author,
    }

    print(data)


def get_all_book():
    for i in range(0,250,25):
        url = https://book.douban.com/top250?start= + str(i)
        wb_data = requests.get(url)
        soup = BeautifulSoup(wb_data.text,lxml)
        href_list = soup.select(div.pl2 > a)
        for href in href_list:
            link = href.get(href)
            get_book(link)

get_all_book()

 

爬虫-爬取豆瓣图书TOP250

标签:top   tin   com   tle   ref   lis   style   str   from   

原文地址:https://www.cnblogs.com/hiss/p/8870792.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!