通过request获取网页资讯通过BeautifulSoup剖析网页元素

时间：2017-08-21 15:52:35 阅读：162 评论：0 收藏：0 [点我收藏+]

标签：from request get sel href other int 获取网页 text

import requests

newsUrl =‘http://news.sina.com.cn/china/‘

res = requests.get(newsUrl)

res.encoding =‘utf-8’

pint

print(res.text)

//然后通过DOM Tree来剖析网页元素

from bs4 import BeautifulSoup

html_sample =‘\

<html>\

<body>\

<h1 id="title">this is h1</h1>\

<a class="link" href="fdfdfdfd">this is a link</a>\

<a class="link" href="fdfdfdfd">this is another link</a>\

</body>\

</html>‘

‘‘‘

html.parser 解析器 ,不写的话会发出警告

‘‘‘

soup = BeautifulSoup(html_sample,‘html.parser’)

print(soup.text)

#找出所有含特定标签的HTML元素

#1: 使用select 找出含有h1标签的元素

header = soup.select(‘h1’)

print(header)print(header[0].text )

#第0个标签中的文字

#2: 使用select找出含有a标签的元素

alink = soup.select(‘a’)

print(alink)

for link in alink:

#print(link)

print(link.text)

#取得含有特定CSS属性的元素

#1使用select找出所有id为title的元素(id前需加#)

aTitle = soup.select(‘#title‘)

print(aTitle)

#2使用select找出所有class为link的元素(class前需要加.)

for mylink in soup.select(‘.link‘):

print(mylink)

#取得所有a标签内的链接

#使用select找出所有a tag的href连结

ahref = soup.select(‘a‘)

for ah in ahref:

print(ah[‘href‘])

通过request获取网页资讯通过BeautifulSoup剖析网页元素

标签：from request get sel href other int 获取网页 text

原文地址：http://www.cnblogs.com/tian-sun/p/7404394.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行

通过request获取网页资讯 通过BeautifulSoup剖析网页元素

通过request获取网页资讯通过BeautifulSoup剖析网页元素