Python练习

时间：2020-02-03 22:04:01 阅读：88 评论：0 收藏：0 [点我收藏+]

标签：www none 练习 mamicode for 导入 splay 爬虫基础 slist

爬虫基础练习——抓取网页数据

题目：抓取http://www.cntour.cn/首页新闻

分析：依次找到要抓取的数据的节点

使用筛选器依次找到要抓取的节点

#main>div>div.mtop.firstMod.clearfix>div.centerBox>ul.newsList>li>a

然后代码如下：

import requests        #导入requests包
import re
from bs4 import    BeautifulSoup
url=‘http://www.cntour.cn/‘
strhtml=requests.get(url)
soup=BeautifulSoup(strhtml.text,‘lxml‘)
data = soup.select(‘#main>div>div.mtop.firstMod.clearfix>div.centerBox>ul.newsList>li>a‘)
for item in data:
    result={
        ‘ID‘:re.findall(‘\d+‘,item.get(‘href‘)),
        ‘title‘:item.get_text(),
        ‘link‘:item.get(‘href‘)
    }
    print(result)

结果如下：

Python练习

标签：www none 练习 mamicode for 导入 splay 爬虫基础 slist

原文地址：https://www.cnblogs.com/madyina/p/12257503.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行