标签:source img esc san obj 标题 import highlight neu
import urllib.request as urllib2
from bs4 import BeautifulSoup
url=‘http://news.gzcc.cn/html/xiaoyuanxinwen/‘
request=urllib2.Request(url)
response=urllib2.urlopen(request)
bsObj=BeautifulSoup(response.read(),"html.parser")
for i in bsObj.select(‘li‘):
if len(i.select(‘.news-list-title‘)) > 0:
time = i.select(‘.news-list-info‘)[0].contents[0].text
source=i.select(‘.news-list-info‘)[0].contents[1].text
title = i.select(‘.news-list-title‘)[0].text
describe=i.select(‘.news-list-description‘)[0].text
url = i.select(‘a‘)[0][‘href‘]
print(time,title,url,describe)
用requests库和BeautifulSoup4库爬取新闻列表
标签:source img esc san obj 标题 import highlight neu
原文地址:http://www.cnblogs.com/djye/p/7605401.html