码迷,mamicode.com
首页 > Web开发 > 详细

学习笔记 urllib

时间:2018-05-15 22:33:57      阅读:276      评论:0      收藏:0      [点我收藏+]

标签:AC   uid   window   web   gen   safari   use   html   code   

第一步:

get

# -*- coding:utf-8  -*-
# 日期:2018/5/15 19:39
# Author:小鼠标
from urllib import request

url = http://news.sina.com.cn/guide/
response = request.urlopen(url)  #返回http对象
web_data = response.read().decode(utf-8)  #响应内容
web_status = response.status                #响应状态码
print(web_status,web_data)

post

# -*- coding:utf-8  -*-
# 日期:2018/5/15 19:39
# Author:小鼠标
from urllib import request,parse

url = http://news.sina.com.cn/guide/
#post表单提交的内容
data = [
    (name,xiaoshubiao),
    (pwd,xiaoshubiao)
]
login_data = parse.urlencode(data).encode(utf-8)
response = request.urlopen(url,data = login_data)  #返回http对象
web_data = response.read().decode(utf-8)  #响应内容
web_status = response.status                #响应状态码
print(web_status,web_data)

第二步:伪装浏览器

# -*- coding:utf-8  -*-
# 日期:2018/5/15 19:39
# Author:小鼠标
from urllib import request,parse

url = http://news.sina.com.cn/guide/
req = request.Request(url) 
req.add_header(User-Agent,Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 UBrowser/6.2.3964.2 Safari/537.36)
req.add_header(Accept,text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8)
response = request.urlopen(req)
web_data = response.read().decode(utf-8)  #响应内容
web_status = response.status                #响应状态码
print(web_status,web_data)

第三步:使用代理ip

# -*- coding:utf-8  -*-
# 日期:2018/5/15 19:39
# Author:小鼠标
from urllib import request,parse

url = ‘http://news.sina.com.cn/guide/
req = request.Request(url)
#使用代理ip
proxy = request.ProxyHandler({http:221.207.29.185:80})
opener = request.build_opener(proxy, request.HTTPHandler)
request.install_opener(opener)

req.add_header(User-Agent,Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 UBrowser/6.2.3964.2 Safari/537.36)
req.add_header(Accept,text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8)
response = request.urlopen(req)
web_data = response.read().decode(utf-8)  #响应内容
web_status = response.status                #响应状态码
print(web_status,web_data)

第四步:内容解析

  可以使用封装好的BeautifulSoup,也可以使用re正则来匹配,原理都差不多。

学习笔记 urllib

标签:AC   uid   window   web   gen   safari   use   html   code   

原文地址:https://www.cnblogs.com/7749ha/p/9042861.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!