标签:
1
2
|
# BeautifulSoup 的下载与安装 pip install BeautifulSoup |
1
2
3
4
|
# BeautifulSoup 快速开始 soup = BeautifulSoup(html_doc) print soup.title |
1
2
|
# BeautifulSoup 结果 <title>前门大街_百度百科< / title> |
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
# BeautifulSoup 示例 from BeautifulSoup import BeautifulSoup import urllib2 soup = BeautifulSoup(html_doc) print type (soup) print type (soup.title) print type (soup.title.string) print soup.title print soup.title.string |
1
2
3
4
5
6
7
8
|
# BeautifulSoup 示例结果 < class ‘BeautifulSoup.BeautifulSoup‘ > < class ‘BeautifulSoup.Tag‘ > < class ‘BeautifulSoup.NavigableString‘ > <title>百度一下,你就知道< / title> 百度一下,你就知道 print soup.title print soup.title.string |
1
2
3
4
5
6
7
8
9
10
|
# BeautifulSoup 示例 title = soup.title print type (title.contents) print title.contents print title.contents[ 0 ] # BeautifulSoup 示例结果 < type ‘list‘ > [u ‘\u767e\u5ea6\u4e00\u4e0b\uff0c\u4f60\u5c31\u77e5\u9053‘ ] 百度一下,你就知道 |
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
# BeautifulSoup 示例 html = soup.html print html. next print ‘‘ print html. next . next print html. next . next .nextSibling # BeautifulSoup 示例结果 <head><meta http - equiv = "content-type" content = "text/html;charset=utf-8" / ><meta http - equiv = "X-UA-Compatible" content = "IE=Edge" / ><meta content = "always" name = "referrer" / ><meta name = "theme-color" content = "#2932e1" / ><link rel = "shortcut icon" href = "/favicon.ico" type = "image/x-icon" / ><link rel = "icon" sizes = "any" mask = "mask" href = "//www.baidu.com/img/baidu.svg" / ><link rel = "dns-prefetch" href = "//s1.bdstatic.com" / ><link rel = "dns-prefetch" href = "//t1.baidu.com" / ><link rel = "dns-prefetch" href = "//t2.baidu.com" / ><link rel = "dns-prefetch" href = "//t3.baidu.com" / ><link rel = "dns-prefetch" href = "//t10.baidu.com" / ><link rel = "dns-prefetch" href = "//t11.baidu.com" / ><link rel = "dns-prefetch" href = "//t12.baidu.com" / ><link rel = "dns-prefetch" href = "//b1.bdstatic.com" / ><title>百度一下,你就知道< / title> ...... < / head> <meta http - equiv = "content-type" content = "text/html;charset=utf-8" / > <meta http - equiv = "X-UA-Compatible" content = "IE=Edge" / > |
标签:
原文地址:http://www.cnblogs.com/xiamaogeng/p/4646313.html