标签:from 源码分析 rar margin sel string 树形结构 nts html
发现一个问题:
bs4 FeatureNotFound: Couldn‘t find a tree builder with the features you requested: lxml. Do you need to install a parser library?
解决方法:将"lxml"改成"html.parser"
soup = BeautifulSoup(content, "lxml")改成
soup = BeautifulSoup(content, "html.parser")
今天学习了关于python中beautiful soup的一些内容
了解到soup.find_all("tr",attrs={"class":"alt"})中的树形结构以及contents的使用
运行了简单的爬取中国大学排名的代码,结合网页源代码进行python源码分析理解
1 import requests 2 from bs4 import BeautifulSoup 3 4 headers = { 5 "User-Agent": "Opera/9.80 (Windows NT 6.0) Presto/2.12.388 Version/12.14" 6 } 7 response = requests.get("http://www.zuihaodaxue.com/zuihaodaxuepaiming2019.html", headers=headers) 8 response.encoding = "utg-8" 9 if response.status_code == 200: 10 soup = BeautifulSoup(response.text,"html.parser") 11 trTags = soup.find_all("tr",attrs={"class":"alt"}) 12 for trTag in trTags: 13 id = trTag.contents[0].string 14 name = trTag.contents[1].string 15 addr = trTag.contents[2].string 16 sco = trTag.contents[3].string 17 sco1 = trTag.contents[4].string 18 print(f"{id} {name} {addr} {sco} {sco1}") 19 #代码来源于网络,比较简单
代码运行结果:
标签:from 源码分析 rar margin sel string 树形结构 nts html
原文地址:https://www.cnblogs.com/lixv2018/p/12288915.html