一个完整的大作业

时间：2017-11-02 17:02:25 阅读：177 评论：0 收藏：0 [点我收藏+]

标签：print title 网站 sel sts ges sele lis 爬取

1.选取一个自己感兴趣的主题，我选取了搜狐新闻

网站：http://news.sohu.com/

2.网络上爬取相关的数据，并输出结果

import requests
from bs4 import BeautifulSoup

url = ‘http://news.sohu.com/‘
res = requests.get(url)
res.encoding = ‘UTF-8‘

soup = BeautifulSoup(res.text, ‘html.parser‘)

for news in soup.select(‘.list16‘):
    li = news.select(‘li‘)  
    if len(li) > 0:      
        title = li[0].text       
        href = li[0].select(‘a‘)[0][‘href‘]
        print(title, href)

技术分享

3.进行文本分析，生成词云。

import jieba
from wordcloud import WordCloud
import matplotlib.pyplot as plt
 
text =open("D:\\cc.txt",‘r‘,encoding=‘utf-8‘).read()
print(text)
wordlist = jieba.cut(text,cut_all=True)
wl_split = "/".join(wordlist)
 
mywc = WordCloud().generate(text)
plt.imshow(mywc)
plt.axis("off")
plt.show()

4.结果

技术分享

一个完整的大作业

标签：print title 网站 sel sts ges sele lis 爬取

原文地址：http://www.cnblogs.com/hzl123/p/7772912.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行