标签:rip option strip() text headers strong clear python request
以长沙楼盘为例,看一下它的房价情况如何url = https://cs.newhouse.fang.com/house/s/b91/
一、页面
二、分析页面源代码
我们要获得的数据就是名字和价格,先来分析一下源代码,鼠标右键Inspect,并且打开xpath,第一步,找到需要提取数据的区域,选中定位到代码相应位置,然后右击copy xpath到xpath里面去,可以发现插件中右侧results有一个,就是下面那片黄色的区域,即我们要提取的数据。
好了,网页部分已经了解了,那么接下来就是用代码抓取数据了。
三、代码实现
成果图:
play.py
#!/usr/bin/env python # _*_ coding: UTF-8 _*_ """================================================= @Project -> File : Operate_system_ModeView_structure -> play.py @IDE : PyCharm @Author : zihan @Date : 2020/5/6 14:59 @Desc : =================================================""" import requests from lxml import etree from pyecharts.charts import Bar import pyecharts.options as opts def getData(): url = "https://cs.newhouse.fang.com/house/s/b91/" headers = { ‘User-Agent‘: "" } response = requests.get(url, headers=headers) # 发送请求 data= response.content.decode(encoding=‘gbk‘) html = etree.HTML(data) house_list = html.xpath(‘//div[@class="nl_con clearfix"]/ul/li‘) names = [] prices = [] for i in house_list: name = i.xpath(‘.//div[@class="nlcd_name"]/a/text()‘) price = i.xpath(‘.//div[@class="nhouse_price"]/span/text()‘) if name != [] and price != []: if price != [‘价格待定‘]: name = name[0].strip() names.append(name) price = price[0] prices.append(price) return names, prices def main(): print("main() func is starting...") names, prices = getData() # print(names) # print(prices) bar = Bar() bar.add_xaxis(names) bar.add_yaxis(‘长沙房价图‘, prices) bar.set_global_opts( xaxis_opts=opts.AxisOpts( axislabel_opts=opts.LabelOpts(rotate=40), ), yaxis_opts=opts.AxisOpts(name="价格(元、平方米)"), title_opts=opts.TitleOpts(title="柱状图") ) bar.render(‘房价图.html‘) if __name__ == ‘__main__‘: main()
好了。
标签:rip option strip() text headers strong clear python request
原文地址:https://www.cnblogs.com/smart-zihan/p/12838340.html