码迷,mamicode.com
首页 > 编程语言 > 详细

Python爬虫学习1

时间:2016-10-05 17:12:45      阅读:119      评论:0      收藏:0      [点我收藏+]

标签:

 1 #coding=utf-8
 2 from urllib2 import urlopen
 3 from bs4 import BeautifulSoup
 4 import urllib2
 5 url="http://pythonscraping.com/pages/page1.html"
 6 def getTitle(url):
 7     """
 8     说明一下,处理异常的过程
 9     1.检查是否能打开网页 异常类型为urllib2.HTTPError
10     2.检查是否服务器存在,不存在返回空,那么在read是返回AttributeError
11     :param url:
12     :return:
13     """
14     try:
15 
16         html=urlopen(url)
17     except urllib2.HTTPError as e:
18 #这里的错误是网页不存在
19         print e
20         return None
21     try:
22         bsobj=BeautifulSoup(html.read(),"html.parser")
23         title=bsobj.body.h1
24     except AttributeError as e:
25         return None
26     return title
27 title=getTitle(url)
28 if title is None:
29     print "Title could not be found"
30 else:
31     print title

 

Python爬虫学习1

标签:

原文地址:http://www.cnblogs.com/dream-for/p/5932335.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!