抓取小猪短租1000张列表页内容

时间：2017-05-31 22:11:58 阅读：162 评论：0 收藏：0 [点我收藏+]

标签：format with bs4 .com 打印 title 并且 zip python

代码如下

#!/usr/bin/env python
# -*- coding:utf-8 -*-

from bs4 import BeautifulSoup
import requests


def get_page_within(pages):
    for page in range(1, pages+1):
        wb = requests.get(‘http://bj.xiaozhu.com/search-duanzufang-p{}-0/‘.format(page))
        soup = BeautifulSoup(wb.text, ‘lxml‘)
        titles = soup.select(‘span.result_title‘)
        prices = soup.select(‘span.result_price > i‘)
        for title, price in zip(titles, prices):
            date = {
                ‘title‘: title.get_text(),
                ‘price‘: price.get_text()
            }
            print(date)
get_page_within(pages=1000)
针对代码解释下

from bs4 import BeautifulSoup
import requests
引入beautifulsoup和requests两个库

def get_page_within(pages):
构建def函数意思是获取pages张页面的数据

for page in range(1, pages+1):
以1为起点循环pages+1个数

wb = requests.get(‘http://bj.xiaozhu.com/search-duanzufang-p{}-0/‘.format(page))

通过.famate让括号内的数切换并且通过for循环和request库解析pages个网址的内容

soup = BeautifulSoup(wb.text, ‘lxml‘)
通过beautifulsoup库解析网页内数据

titles = soup.select(‘span.result_title‘)

prices = soup.select(‘span.result_price > i‘)
选取title和prices数据

        for title, price in zip(titles, prices):
            date = {
                ‘title‘: title.get_text(),
                ‘price‘: price.get_text()
            }
            print(date)
将获得的内容装到字典里并打印

get_page_within(pages=1000)
给def一个值运行def函数

抓取小猪短租1000张列表页内容

标签：format with bs4 .com 打印 title 并且 zip python

原文地址：http://www.cnblogs.com/gttpython/p/6926034.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行