用python爬取安居客苏州的房价数据

时间：2018-04-08 22:31:35 阅读：901 评论：0 收藏：0 [点我收藏+]

标签：避免命令 nec div use index xpath host ike

下面是我用python爬取的房价数据，程序可以跑完，但是不知何故。数据库中并没有收到数据？请大家帮忙看看

#这里导入我们需要用的模块，并连接SQLyog,并创建游标

import requests
import re
from lxml import etree
import pymysql
import time
conn=pymysql.connect(host=‘localhost‘,user=‘root‘,passwd=‘1234‘,db=‘mydatabase1‘,port=3306,charset=‘utf8‘)
cursor=conn.cursor()
#获取头命令，进行伪装访问浏览器，避免爬取失败被封IP：
headers={‘User-Agent‘:‘Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36‘}
#创建一个获取网址的函数：
def get_house_url(url):
    html=requests.get(url,headers=headers)#利用头命令进行伪装访问网址
    selector=etree.HTML(html.text) #解析源代码，使之成为我们需要的文本文档
    house_hrefs=selector.xpath(‘//div[@class="house-title"]/a/@href‘)#获取连接
    for house_href in house_hrefs:
        get_house_info(house_href)
         

def get_house_info(url):  #获取连接里面的具体信息
    html=requests.get(url,headers=headers)
    selector=etree.HTML(html.text)  #利用头命令进行访问浏览器并把源代码解析成文本文档
    try:
        name=selector.xpath(‘//*[@id="content"]/div[2]/h3/text()‘)[0]  #以下依次都是进行抓取有效的数据，
        village=selector.xpath(‘//*[@id="content"]/div[3]/div[1]/div[3]/div/div[1]/div/div[1]/dl[1]/dd/a/text()‘)[0]
        price=selector.xpath(‘//*[@id="content"]/div[3]/div[1]/div[1]/span[1]/em/text()‘)[0]
        style=selector.xpath(‘//*[@id="content"]/div[3]/div[1]/div[3]/div/div[1]/div/div[2]/dl[1]/dd/text()‘)[0]
        area=selector.xpath(‘//*[@id="content"]/div[3]/div[1]/div[1]/span[3]/em/text()‘)[0]
        unit_price=selector.xpath(‘//*[@id="content"]/div[3]/div[1]/div[3]/div/div[1]/div/div[3]/dl[2]/dd/text()‘)[0]
        cursor.execute("insert into suzhou_house(name,village,price,style,area,unit_price) values(%s,%s,%s,%s,%s,%s)",(str(name),str(village),str(price),str(style),str(area),str(unit_price)))

    except IndexError:
        pass

if __name__==‘__main__‘:
urls=[‘https://suzhou.anjuke.com/sale/p{}-rd1/?kw=%E8%8B%8F%E5%B7%9E‘.format(str(i)) for i in range(1,6)]
for url in urls:
get_house_url(url)
time.sleep(2)
conn.commit()

以上是我写的爬虫程序。但是不知何故。数据库SQLyog中并没有收到数据,请读者帮忙看看。

用python爬取安居客苏州的房价数据

标签：避免命令 nec div use index xpath host ike

原文地址：https://www.cnblogs.com/A1006438/p/8747371.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行