BeautifulSoup4-提取HTML中所有URL链接

时间：2020-06-27 10:09:27 阅读：147 评论：0 收藏：0 [点我收藏+]

标签：ever this demo url mpi can ext ack html

‘‘‘

提取HTML中所有URL链接

‘‘‘

import requests
from bs4 import BeautifulSoup
import re

# r = requests.get("https://python123.io/ws/demo.html")
# demo = r.text

demo = """
<html><head><title>This is a python demo page</title></head>
<body>
The demo python introduces several python courses.
Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
<a href="http://www.icourse163.org/course/BIT-268001" class="py1" id="link1">Basic Python</a> and <a href="http://www.icourse163.org/course/BIT-1001870001" class="py2" id="link2">Advanced Python</a>.
</body></html>
"""

"""

find_all(name, attrs, recursive, string, **kwargs)方法：
<tag>(..) 等价于 <tag>.find_all(..)
soup(..) 等价于 soup.find_all(..)

"""

soup = BeautifulSoup(demo, "html.parser")

for link in soup.find_all(‘a‘): # 1、搜索到所有<a>标签
print(link.get("href")) # 2、解析<a>标签格式，提取href后的链接内容

print(soup.find_all(‘a‘)) # 查找<a>标签
‘[<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a>, <a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>]‘
print(soup.find_all([‘a‘, ‘b‘])) # 同时查找<a>标签

for tag in soup.find_all(True): # 获取所有标签
print(tag.name)

‘‘‘
html
head
title
body
p
b
p
a
a
‘‘‘

# 只显示以b开头的标签，包括和<body>标签元素
for tag in soup.find_all(re.compile(‘b‘)): # 正则表达式查找以开头的标签元素
print(tag.name)

print(soup.find_all(‘p‘, ‘course‘)) # 返回标签中，属性值为"course"的标签元素

print(soup.find_all(id = "link1")) # 返回属性中id域等于"link1"的标签元素

print(soup.find_all(id = re.compile("link"))) # 返回属性中id域以"link"开头的所有标签元素

print(soup.find_all(attrs={"class": "py1"}))

BeautifulSoup4-提取HTML中所有URL链接

标签：ever this demo url mpi can ext ack html

原文地址：https://www.cnblogs.com/pencil2001/p/13197203.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行