码迷,mamicode.com
首页 > 编程语言 > 详细

python学习-爬虫

时间:2016-09-17 00:22:48      阅读:215      评论:0      收藏:0      [点我收藏+]

标签:

转载自 静觅的博客

最普通下载网页

1 import urrlib2 
2 response = urllib2.urlopen("http://www.baidu.com")
3 print response.read()

Post方式

1 import urllib
2 import urllib2
3 
4 values = {"username":"*****", "password":"*****"}
5 data = urllib.urlencode(values)
6 url = "   "
7 request = urllib2.Request(url,data)
8 response = urlopen(request)
9 print response.read()

Get方式

 1 import urllib2
 2 import urllib
 3 
 4 values = {}
 5 values["username"] = 
 6 values["password"] = 
 7 data = urlencode(values)
 8 url = 
 9 geturl = url + "?" + data
10 request = urllib2.Request(geturl)
11 response = urllib2.urlopen(request)
12 print response.read()

 设置代理

1 import urllib2
2 enable_proxy = True
3 proxy_handler = urllib2.ProxyHandler({"http" : http://some-proxy.com:8080})
4 null_proxy_handler = urllib2.ProxyHandler({})
5 if enable_proxy:
6     opener = urllib2.build_opener(proxy_handler)
7 else:
8     opener = urllib2.build_opener(null_proxy_handler)
9 urllib2.install_opener(opener)

设置延时

1 import urllib2
2 response = urllib2.urlopen(http://www.baidu.com,data, 10)

 异常处理

 1 import urllib2
 2 
 3 req = urllib2.Request(http://blog.csdn.net/cqcre)
 4 try:
 5     urllib2.urlopen(req)
 6 except urllib2.URLError, e:
 7     if hasattr(e,"code"):
 8         print e.code
 9     if hasattr(e,"reason"):
10         print e.reason
11 else:
12     print "OK"

设置cookie

 1 import urllib
 2 import urllib2
 3 import cookielib
 4 
 5 filename = cookie.txt
 6 #声明一个MozillaCookieJar对象实例来保存cookie,之后写入文件
 7 cookie = cookielib.MozillaCookieJar(filename)
 8 opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie))
 9 postdata = urllib.urlencode({
10             stuid:201200131012,
11             pwd:23342321
12         })
13 #登录教务系统的URL
14 loginUrl = http://jwxt.sdu.edu.cn:7890/pls/wwwbks/bks_login2.login
15 #模拟登录,并把cookie保存到变量
16 result = opener.open(loginUrl,postdata)
17 #保存cookie到cookie.txt中
18 cookie.save(ignore_discard=True, ignore_expires=True)
19 #利用cookie请求访问另一个网址,此网址是成绩查询网址
20 gradeUrl = http://jwxt.sdu.edu.cn:7890/pls/wwwbks/bkscjcx.curscopre
21 #请求访问成绩查询网址
22 result = opener.open(gradeUrl)
23 print result.read()

 

python学习-爬虫

标签:

原文地址:http://www.cnblogs.com/ajmd/p/5876947.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!