搜索关键字：python、爬虫，搜索到2294个结果！码迷,mamicode.com！

[python爬虫]使用urllib函数urlretrieve报错[socket error][Errno 10054]

为了练手，使用爬虫爬一个“你懂得”图床的，使用的是urlretrieve函数，不但速度慢，还总是会报错，不是open的timeout就是上面提到的socket error。在网上找了许多办法诸如在urllib2.Request.urlopen().read()后需要调用close()关闭等方法并未奏效。由于不想麻烦scrapy等库，所以发现了个简单粗暴的办法：直接使用urllib自带的ope...

分类：编程语言时间：2015-04-26 13:54:36 阅读次数：295

[python][爬虫]从网页中下载图片

说明：仅为测试下载图片、正则表达式测试url为钢铁侠贴吧的一个介绍mark各代盔甲帖子以下代码将第一页的图片全部下载到本程序根目录#!/usr/bin/env python #! -*- coding: utf-8 -*- import urllib,urllib2 import re #返回网页源代码 def getHtml(url): html = urllib2.urlopen...

分类：编程语言时间：2015-04-23 13:27:53 阅读次数：184

[python][爬虫]暴漫gif下载

说明：和上一个下载百度贴吧图片差不多，修改了正则，加入了页码控制；此外也加入了输出格式控制，如果想加入手动设定存储路径功能，可以参考之前的百度贴吧爬虫#!/usr/bin/env python #! -*- coding: utf-8 -*- #图片地址样例:src="http://ww2.sinaimg.cn/large/005Yan1vjw1erf95qkbfog307e08uu0y.gif...

分类：编程语言时间：2015-04-23 13:18:49 阅读次数：228

[python]糗百热点爬虫v2.0【15/4/21更新】

刚刚测试了糗百爬虫，结果第二天糗百的源代码就换格式了= = 重新改了正则表达式发上来：#! -*- coding:utf-8 -*- #! usr/bin/python''' #===================================================== # FileName: Spider_qb.py # Describe: 从糗百下载段子并依次播放 #...

分类：编程语言时间：2015-04-22 09:38:51 阅读次数：200

[python]糗百热点爬虫

有小部分的修改，并加入详细注释#! -*- coding:utf-8 -*- #! usr/bin/python''' #===================================================== # FileName: Spider_qb.py # Describe: 从糗百下载段子并依次播放 # Modifier: sunny # Sinc...

分类：编程语言时间：2015-04-20 18:38:44 阅读次数：143

Python爬虫原理的小demo

案例讲解import urllib #调用uerllib import webbrowser url = 'http://blog.csdn.net/xlgen157387' content = urllib.urlopen(url).read() open('test.html','w').write(content) #写入到test.html文件中 webbrowser.open_new_...

分类：编程语言时间：2015-04-18 16:11:40 阅读次数：175

[Python]网络爬虫：北邮图书馆排行榜

北邮图书馆爬虫...

分类：编程语言时间：2015-04-17 14:02:36 阅读次数：253

Python爬虫Csdn系列III

Python爬虫Csdn系列III By 白熊花田(http://blog.csdn.net/whiterbear) 转载需注明出处，谢谢。说明：在上一篇博客中，我们已经能够获取一个用户所有文章的链接了，那么这一节自然就是要将这些博客下载下来咯。分析：有了链接下载文章自然是不难。但是，获取的数据该怎么处理？每...

分类：编程语言时间：2015-04-11 16:23:12 阅读次数：189

python爬虫爬取美女图片

python 爬虫爬取美女图片 #coding=utf-8 import urllib import re import os import time import threading def getHtml(url): page = urllib.urlopen(url) html = page.read() return html def getImg...

分类：编程语言时间：2015-04-11 09:02:27 阅读次数：226

python+pyspider+phantomjs实现简易爬虫功能

本篇文章的目的有两个： 1.记录搭建爬虫环境的过程 2.总结爬虫项目的心得体会一、系统环境该方案在32位ubuntu10.04和64位centos6.9上面测试通过，所需要用到的软件如下： 1.ubuntu10.04或者centos6.9任选其一，下文主要以centos6.9来说明 2.pyspider源代码，可以从这里下载到http://download.csdn.net/detail...

分类：编程语言时间：2015-04-10 20:13:53 阅读次数：1345

共2294条上一页 1 ... 217 218 219 220 221 ... 230 下一页

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)