码迷,mamicode.com
首页 > 其他好文 > 详细

抓取小猪短租1000张列表页内容

时间:2017-05-31 22:11:58      阅读:162      评论:0      收藏:0      [点我收藏+]

标签:format   with   bs4   .com   打印   title   并且   zip   python   

代码如下

#!/usr/bin/env python
# -*- coding:utf-8 -*-

from bs4 import BeautifulSoup
import requests


def get_page_within(pages):
for page in range(1, pages+1):
wb = requests.get(‘http://bj.xiaozhu.com/search-duanzufang-p{}-0/‘.format(page))
soup = BeautifulSoup(wb.text, ‘lxml‘)
titles = soup.select(‘span.result_title‘)
prices = soup.select(‘span.result_price > i‘)
for title, price in zip(titles, prices):
date = {
‘title‘: title.get_text(),
‘price‘: price.get_text()
}
print(date)
get_page_within(pages=1000)
针对代码解释下
from bs4 import BeautifulSoup
import requests
引入beautifulsoup和requests两个库
def get_page_within(pages):
构建def函数意思是获取pages张页面的数据
for page in range(1, pages+1):
以1为起点循环pages+1个数

wb = requests.get(‘http://bj.xiaozhu.com/search-duanzufang-p{}-0/‘.format(page))
通过.famate让括号内的数切换并且通过for循环和request库解析pages个网址的内容
soup = BeautifulSoup(wb.text, ‘lxml‘)
通过beautifulsoup库解析网页内数据

titles = soup.select(‘span.result_title‘)
prices = soup.select(‘span.result_price > i‘)
选取title和prices数据
        for title, price in zip(titles, prices):
date = {
‘title‘: title.get_text(),
‘price‘: price.get_text()
}
print(date)
将获得的内容装到字典里并打印
get_page_within(pages=1000)
给def一个值运行def函数








抓取小猪短租1000张列表页内容

标签:format   with   bs4   .com   打印   title   并且   zip   python   

原文地址:http://www.cnblogs.com/gttpython/p/6926034.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!