标签:ext Once mes otto print 选择 attr port 官方
BeautifulSoup官方文档:https://beautifulsoup.readthedocs.io/zh_CN/latest/#id8
太繁琐的,精简了一些自己用的到的。
1.index.html
<html><head><title>The Dormouse‘s story</title></head> <body> <p class="title"><b>The Dormouse‘s story</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p>
2..prettify()--标准的缩进格式输出
from bs4 import BeautifulSoup soup = BeautifulSoup(html_doc, ‘html.parser‘) print(soup.prettify()) # <html> # <head> # <title> # The Dormouse‘s story # </title> # </head> # <body> # <p class="title"> # <b> # The Dormouse‘s story # </b> # </p> # <p class="story"> # Once upon a time there were three little sisters; and their names were # <a class="sister" href="http://example.com/elsie" id="link1"> # Elsie # </a> # , # <a class="sister" href="http://example.com/lacie" id="link2"> # Lacie # </a> # and # <a class="sister" href="http://example.com/tillie" id="link2"> # Tillie # </a> # ; and they lived at the bottom of a well. # </p> # <p class="story"> # ... # </p> # </body> # </html>
3.选择标签,属性
soup.title # <title>The Dormouse‘s story</title> soup.title.name # u‘title‘ soup.title.string # u‘The Dormouse‘s story‘ soup.title.parent.name # u‘head‘ soup.p # <p class="title"><b>The Dormouse‘s story</b></p> soup.p[‘class‘] # u‘title‘ soup.a # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a> soup.find_all(‘a‘) # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] soup.find(id="link3") # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a> for link in soup.find_all(‘a‘): print(link.get(‘href‘)) # http://example.com/elsie # http://example.com/lacie # http://example.com/tillie print(soup.get_text()) # The Dormouse‘s story # # The Dormouse‘s story # # Once upon a time there were three little sisters; and their names were # Elsie, # Lacie and # Tillie; # and they lived at the bottom of a well. # # ... #Tag soup = BeautifulSoup(‘<b class="boldest">Extremely bold</b>‘) tag = soup.b type(tag) # <class ‘bs4.element.Tag‘> #Name tag.name # u‘b‘ tag.name = "blockquote" tag # <blockquote class="boldest">Extremely bold</blockquote> #Attributes tag[‘class‘] # u‘boldest‘ tag.attrs # {u‘class‘: u‘boldest‘} tag[‘class‘] = ‘verybold‘ tag[‘id‘] = 1 tag # <blockquote class="verybold" id="1">Extremely bold</blockquote> del tag[‘class‘] del tag[‘id‘] tag # <blockquote>Extremely bold</blockquote> tag[‘class‘] # KeyError: ‘class‘ print(tag.get(‘class‘)) # None
标签:ext Once mes otto print 选择 attr port 官方
原文地址:https://www.cnblogs.com/eilinge/p/9641598.html