码迷,mamicode.com
首页 > 其他好文 > 详细

爬虫day 04_01(爬百度页面)

时间:2017-11-11 13:20:37      阅读:171      评论:0      收藏:0      [点我收藏+]

标签:key   processor   live   keep   tpc   request   read   mis   headers   

import urllib.request
import http.cookiejar
from lxml import etree
head = {
    Connection: Keep-Alive,
    Accept: text/html, application/xhtml+xml, */*,
    Accept-Language: en-US,en;q=0.8,zh-Hans-CN;q=0.5,zh-Hans;q=0.3,
    User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko
}
# 给opener加上cookie
def makeMyOpener(head):
    cj = http.cookiejar.CookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
    header = []
    for key, value in head.items():
        elem = (key, value)
        header.append(elem)
    opener.addheaders = header
    return opener
# 通过cookie 爬百度
oper=makeMyOpener(head)
url="https://www.baidu.com/s?ie=utf-8&f=3&rsv_bp=1&rsv_idx=1&tn=baidu&wd=python%20str%20%E8%BD%AC%20int&oq=python%2520str%2520%25E8%25BD%25AC%2520int&rsv_pq=c24aa0760000154b&rsv_t=c323uk7fLXupzfPqhHcqM%2F6l8k7Re4K90ZvzI33LDwW0kHYMiSED9rhKzCg&rqlang=cn&rsv_enter=0&prefixsug=python%2520str%2520%25E8%25BD%25AC%2520int&rsp=0"
uop=oper.open(url,timeout=1000)
data=uop.read()
html=data.decode();
print(html)

 

爬虫day 04_01(爬百度页面)

标签:key   processor   live   keep   tpc   request   read   mis   headers   

原文地址:http://www.cnblogs.com/qieyu/p/7818516.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!