码迷,mamicode.com
首页 > 其他好文 > 详细

爬取校园新闻首页的新闻

时间:2018-04-03 23:48:25      阅读:188      评论:0      收藏:0      [点我收藏+]

标签:[]   utf-8   enc   strip()   get   class   highlight   blog   分享图片   

import requests
from bs4 import BeautifulSoup
import re
from datetime import datetime


new_list, add, p_list, pa = [], [], [], []
url = ‘http://news.gzcc.cn/html/xiaoyuanxinwen/‘
res = requests.get(url)
res.encoding = ‘utf-8‘
soup = BeautifulSoup(res.text, ‘html.parser‘)
news = soup.select(‘div[class="list-container"] li a‘)
for i in range(0, len(news)):
    a = re.findall(r‘<a href="(.*?)">‘, str(news[i]))[0]
    # print(a)
    add.append(a)
    new_list.append(news[i].get_text().strip())
resd = requests.get(add[0])
print(add)
print(new_list)
resd.encoding = ‘utf-8‘
soupd = BeautifulSoup(resd.text, ‘html.parser‘)
# print(soupd)
passage = soupd.select(‘div[class="show-container"]‘)
# print(passage)
title = soupd.select(‘div[class="show-info"]‘)
for j in range(0, len(title)):
    pa.append(passage[j].get_text().strip())
print(pa)
print(title)
tm = re.findall(r‘\d\d\d\d-\d\d-\d\d‘, str(title))
print(tm)
# sst = datetime.strftime(str(tm), ‘%Y-%m-%d‘)
# print(sst)

  技术分享图片

 

爬取校园新闻首页的新闻

标签:[]   utf-8   enc   strip()   get   class   highlight   blog   分享图片   

原文地址:https://www.cnblogs.com/miranda-76/p/8711234.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!