2020寒假 12

时间：2020-02-09 22:06:30 阅读：74 评论：0 收藏：0 [点我收藏+]

标签：from 源码分析 rar margin sel string 树形结构 nts html

发现一个问题：

bs4 FeatureNotFound: Couldn‘t find a tree builder with the features you requested: lxml. Do you need to install a parser library?

解决方法：将"lxml"改成"html.parser"

soup = BeautifulSoup(content, "lxml")改成

soup = BeautifulSoup(content, "html.parser")

今天学习了关于python中beautiful soup的一些内容

了解到soup.find_all("tr",attrs={"class":"alt"})中的树形结构以及contents的使用

运行了简单的爬取中国大学排名的代码，结合网页源代码进行python源码分析理解

 1 import requests
 2 from bs4 import BeautifulSoup
 3 
 4 headers = {
 5     "User-Agent": "Opera/9.80 (Windows NT 6.0) Presto/2.12.388 Version/12.14"
 6 }
 7 response = requests.get("http://www.zuihaodaxue.com/zuihaodaxuepaiming2019.html", headers=headers)
 8 response.encoding = "utg-8"
 9 if response.status_code == 200:
10     soup = BeautifulSoup(response.text,"html.parser")
11     trTags = soup.find_all("tr",attrs={"class":"alt"})
12     for trTag in trTags:
13         id = trTag.contents[0].string
14         name = trTag.contents[1].string
15         addr = trTag.contents[2].string
16         sco = trTag.contents[3].string
17         sco1 = trTag.contents[4].string
18         print(f"{id} {name} {addr} {sco} {sco1}")
19     #代码来源于网络，比较简单

代码运行结果：

技术图片

2020寒假 12

标签：from 源码分析 rar margin sel string 树形结构 nts html

原文地址：https://www.cnblogs.com/lixv2018/p/12288915.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行