码迷,mamicode.com
首页 > 编程语言 > 详细

《python编程锦囊》之网络爬虫

时间:2020-04-19 19:37:08      阅读:194      评论:0      收藏:0      [点我收藏+]

标签:port   enc   number   重要   字符串   nbsp   设置   ext   style   

1、对中国天气预报网站爬虫   

#!/usr/bin/env python3
#导入网络请求模块
import requests
#导入Json模块
import json
#头部信息,需要设置网络工具中提取的重要信息“User-Agent”和“Referer”
headers = {User-Agent:Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36           (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36,           Referer:http://www.weather.com.cn/weather1d/101010100.shtml}
#网络请求网址
url = http://d1.weather.com.cn/sk_2d/101010100.html?_=1555986533582
#发送网络请求
response = requests.get(url,headers = headers)
#判断请求是否成功
if response.status_code==200:
    #编码返回信息
    response.encoding = utf-8
    #将json字符串中“var dataSK =”去除
    print(发送网页请求,返回的网页内容:,response.text)
    json_str = response.text.replace(var dataSK =,‘‘)
    print(解码前,返回的json类型:,json_str)
    #将json字符串转换为字典类型
    json_info = json.loads(json_str)
    #打印当前的字典
    print(解码后,返回的字典类型:,json_info)
    #打印城市
    print(城市:,json_info[cityname])
    #打印温度
    print(当前温度:,json_info[temp])
    #打印湿度
    print(相对湿度:,json_info[SD])
    #打印风向等级
    print(风向等级:,json_info[WD],json_info[WS])
    #打印空气质量
    print(空气质量pm2.5:,json_info[aqi_pm25])
    #打印车辆限号
    print(车辆限号为:,json_info[limitnumber])

结果:

发送网页请求,返回的网页内容: var dataSK = {"nameen":"beijing","cityname":"北京","city":"101010100","temp":"18","tempf":"64","WD":"西风","wde":"W","WS":"3级","wse":"<12km/h","SD":"53%","time":"18:22","weather":"多云","weathere":"Cloudy","weathercode":"d01","qy":"1002","njd":"8.93km","sd":"53%","rain":"0.0","rain24h":"0","aqi":"67","limitnumber":"不限行","aqi_pm25":"67","date":"04月19日(星期日)"}
解码前,返回的json类型:  {"nameen":"beijing","cityname":"北京","city":"101010100","temp":"18","tempf":"64","WD":"西风","wde":"W","WS":"3级","wse":"<12km/h","SD":"53%","time":"18:22","weather":"多云","weathere":"Cloudy","weathercode":"d01","qy":"1002","njd":"8.93km","sd":"53%","rain":"0.0","rain24h":"0","aqi":"67","limitnumber":"不限行","aqi_pm25":"67","date":"04月19日(星期日)"}
解码后,返回的字典类型: {nameen: beijing, cityname: 北京, city: 101010100, temp: 18, tempf: 64, WD: 西风, wde: W, WS: 3级, wse: <12km/h, SD: 53%, time: 18:22, weather: 多云, weathere: Cloudy, weathercode: d01, qy: 1002, njd: 8.93km, sd: 53%, rain: 0.0, rain24h: 0, aqi: 67, limitnumber: 不限行, aqi_pm25: 67, date: 04月19日(星期日)}
城市: 北京
当前温度: 18
相对湿度: 53%
风向等级: 西风 3级
空气质量pm2.5: 67
车辆限号为: 不限行

说明:获取User-Agent和Referer的方法,打开谷歌浏览器,打开中国天气预报网站:http://www.weather.com.cn/weather1d/101010100.shtml,右键>审查元素,会出现以下页面:

技术图片

 

 选择network>Headers,获取User-Agent和Referer。

 

《python编程锦囊》之网络爬虫

标签:port   enc   number   重要   字符串   nbsp   设置   ext   style   

原文地址:https://www.cnblogs.com/xiao02fang/p/12732921.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!