码迷,mamicode.com
首页 > 其他好文 > 详细

练习3

时间:2015-08-04 23:15:28      阅读:230      评论:0      收藏:0      [点我收藏+]

标签:爬虫

简单小爬虫


#!/usr/bin/env python

#coding:utf-8

import urllib2

import bs4

url = ‘http://www.163.com‘

content = urllib2.urlopen(url).read()

content =  content.decode(‘gbk‘)


soup = bs4.BeautifulSoup(content)

links = soup.select(‘li a[href]‘)


result = []

for link in links:

    href = link.attrs[‘href‘]

    title = link.text

    if ‘.html‘ in href and ‘163.com‘ in href and len(title) >3:

        result.append(link)

for link in result:

    print link.attrs[‘href‘], link.text


print ‘共有新闻[%s]条‘,   len(result)


本文出自 “Linux_Config” 博客,请务必保留此出处http://liang1026.blog.51cto.com/10119067/1681675

练习3

标签:爬虫

原文地址:http://liang1026.blog.51cto.com/10119067/1681675

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!