码迷,mamicode.com
首页 > 其他好文 > 详细

爬取校园新闻首页的新闻

时间:2018-04-04 16:16:03      阅读:176      评论:0      收藏:0      [点我收藏+]

标签:post   list   bre   attrs   校园   port   inf   rip   span   

import requests
re=requests.get(http://news.gzcc.cn/html/xiaoyuanxinwen/)
re.encoding=utf-8
from bs4 import BeautifulSoup
soup = BeautifulSoup(re.text,html.parser)
#print(soup.select(‘li‘))
for news in soup.select(li):
    if len(news.select(.news-list-title))>0:
        d=news.select(.news-list-title)[0].text
        e = news.select(.news-list-description)[0].text
        r = news.select(.news-list-info)[0].text
        #print(d)
        f=news.select(a)[0].attrs[href]
        #f=news.a.attrs[‘href‘]
        print(e,f)
        print(d,r)

        res = requests.get(f)
        res.encoding = utf-8
        soupd = BeautifulSoup(res.text, html.parser)
        #print(soupd.select(‘.show-content‘)[0].text)
        print(soupd.select(.show-info)[0].text[0:25])
        print(soupd.select(.show-info)[0].text[30:38])
        print(soupd.select(.show-info)[0].text[38:45])
        print(soupd.select(.show-info)[0].text[46:56])
        print(soupd.select(.show-info)[0].text[62:])
        break

 

爬取校园新闻首页的新闻

标签:post   list   bre   attrs   校园   port   inf   rip   span   

原文地址:https://www.cnblogs.com/168-hui/p/8717413.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!