用python零基础写爬虫--编写第一个网络爬虫 -2 设置用户代理

时间：2017-10-08 15:33:39 阅读：229 评论：0 收藏：0 [点我收藏+]

标签：down read ade urllib 零基础情况情况下网页 print

1.设置用户代理

默认情况下，urliib2使用python-urllib、2.7 作为用户代理下载网页内容，其中2.7是python的版本号。为避免一些网站禁封这个默认的用户代理，确保下载更加可靠，我们需要控制用户代理的设定。下面代码对download函数设定了一个名称为 “wswp” 的用户代理。

import urllib2
def download(url,user_agent=‘wswp‘, num_retries=2):
    print ‘downloading:‘,url
    headers={‘User-agent‘:user_agent}
    request=urllib2.Request(url,headers=headers)
    try:
         html=urllib2.urlopen(url).read()
    except urllib2.URLError as e:
        print ‘download error:‘, e.reason
        html=None
        if num_retries>0:
            if hasattr(e, ‘code‘) and  500<=e.code<600:
                #recursively retry 5XX http errors
                return download(url, user_agent,num_retries-1)
    return html

用python零基础写爬虫--编写第一个网络爬虫 -2 设置用户代理

标签：down read ade urllib 零基础情况情况下网页 print

原文地址：http://www.cnblogs.com/mrruning/p/7637441.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行