码迷,mamicode.com
首页 > 其他好文 > 详细

爬虫-day02-抓取和分析

时间:2018-05-09 14:48:44      阅读:131      评论:0      收藏:0      [点我收藏+]

标签:标准库   .com   odi   com   man   人民币   webkit   HERE   python   

###页面抓取###
1、urllib3
    是一个功能强大且好用的HTTP客户端,弥补了Python标准库中的不足
    安装: pip install urllib3
    使用:
import urllib3
http = urllib3.PoolManager()
response = http.request(GET, http://news.qq.com)
print(response.headers)
result = response.data.decode(gbk)
print(result)
 
发送HTTPS协议的请求
安装依赖 : pip install certifi
import  certifi
import urllib3
http = urllib3.PoolManager(cert_reqs = CERT_REQUIRED, ca_certs = certifi.where()) #添加证书
resp = http.request(GET, http://news.baidu.com/)
print(resp.data.decode(utf-8))
 
####带上参数
import urllib3
from urllib.parse import urlencode
http = urllib3.PoolManager()
args = {wd : 人民币}
# url = ‘http://www.baidu.com/s?%s‘ % (args)
url = http://www.baidu.com/s?%s % (urlencode(args))
print(url)
# resp = http.request(‘GET‘ , url)
# print(resp.data.decode(‘utf-8‘))
 
headers = {
    Accept : text/javascript, application/javascript, application/ecmascript, application/x-ecmascript, **; q=0.01,
    Accept-Encoding : gzip, deflate, br,
    Accept-Language : zh-CN,zh;q=0.9,
    Connection : keep-alive,
    Host : www.baidu.com,
    Referer : https://www.baidu.com/s?wd=人民币,
    User-Agent : "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36"
}
resp8 = requests.get(url8, fields=args8, headers=headers8)
print(resp8.text)

 

 
 
 
 

爬虫-day02-抓取和分析

标签:标准库   .com   odi   com   man   人民币   webkit   HERE   python   

原文地址:https://www.cnblogs.com/Albert-w/p/9013194.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!