BeautifulSoup使用

时间：2016-11-27 19:01:50 阅读：142 评论：0 收藏：0 [点我收藏+]

标签：ext 网页 sam ref class 不能 sel .text pre

request能取到网页上面的数据，但是这些是属于结构化的数据，我们不能直接使用，需要将这些数据进行转化，从而方便使用

BeautifulSoup能将标签移除掉，从而获得网页上的数据以及内容

1、将特定标签的内容取出来

单个标签

from bs4 import BeautifulSoup
html_sample = ‘\<html>\ <body>\<h1 id ="title"> HelloWorld</h1>\<a href="#" class="link">This is link1</a>\<a href="# link2" class = "link"> This is link2</a>\</body>\</html>‘

soup= BeautifulSoup(html_sample,‘html.parser‘)
header=soup.select(‘h1‘)
print(header[0].text)


多个相同的标签

from bs4 import BeautifulSoup
html_sample = ‘\<html>\ <body>\<h1 id ="title"> HelloWorld</h1>\<a href="#" class="link">This is link1</a>\<a href="# link2" class = "link"> This is link2</a>\</body>\</html>‘

soup= BeautifulSoup(html_sample,‘html.parser‘)
header=soup.select(‘a‘)
for alink in header:
    print(alink.text)

 2、取出含有特定css属性的元素
id前面需要加#

from bs4 import BeautifulSoup
html_sample = ‘\<html>\ <body>\<h1 id ="title"> HelloWorld</h1>\<a href="#" class="link">This is link1</a>\<a href="# link2" class = "link"> This is link2</a>\</body>\</html>‘

soup= BeautifulSoup(html_sample,‘html.parser‘)
header=soup.select(‘#title‘)
print(header)


class前面加.

from bs4 import BeautifulSoup
html_sample = ‘\<html>\ <body>\<h1 id ="title"> HelloWorld</h1>\<a href="#" class="link">This is link1</a>\<a href="# link2" class = "link"> This is link2</a>\</body>\</html>‘

soup= BeautifulSoup(html_sample,‘html.parser‘)
header=soup.select(‘.link‘)
for alink in header:
    print(alink.text)

3、取得a标签里面链接的内容

from bs4 import BeautifulSoup
html_sample = ‘\<html>\ <body>\<h1 id ="title"> HelloWorld</h1>\<a href="#" class="link">This is link1</a>\<a href="# link2" class = "link"> This is link2</a>\</body>\</html>‘

soup= BeautifulSoup(html_sample,‘html.parser‘)
header=soup.select(‘a‘)
for alink in header:
    print(alink[‘href‘])

BeautifulSoup使用

标签：ext 网页 sam ref class 不能 sel .text pre

原文地址：http://www.cnblogs.com/zlj1992/p/6106653.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行