首页 > 编程语言 > 详细

Python爬虫抓取图片，网址从文件中读取

时间：2015-04-01 09:37:09 阅读：171 评论：0 收藏：0 [点我收藏+]

标签：python 爬虫抓取图片 python模块正则表达式

利用python抓取网络图片的步骤：

1.根据给定的网址获取网页源代码

2.利用正则表达式把源代码中的图片地址过滤出来

3.根据过滤出来的图片地址下载网络图片

import urllib
import re
import os
#urllib,re,os均为Python模块
def gethtml(outline):
page = urllib.urlopen(outline) #抓取网页内容获得图片链接
html = page.read()
return html
def getimg(html): #下载图片保存在同目录下的pictures文件夹下
reg=r‘src="(.+?\.jpg)" pic_ext‘
imgre=re.compile(reg)
imglist=imgre.findall(html)
if not imglist:
print "not found"
else:
filepath=os.getcwd() +‘\pictures‘
print filepath
if os.path.exists(filepath) is False:
os.mkdir(filepath)
global x
for imgurl in imglist:
temp = filepath + ‘\%s.jpg‘ % x
print imgurl
urllib.urlretrieve(imgurl,temp)
x=x+1
x = 0
fp =file("img_path.txt") #所有网址都放在这个文件里
while True:
outline = fp.readline().strip(‘\n‘)
if len(outline)==0:
break
print outline
html=gethtml(outline)
getimg(html)
fp.close()

Python爬虫抓取图片，网址从文件中读取

标签：python 爬虫抓取图片 python模块正则表达式

原文地址：http://blog.csdn.net/gamer_gyt/article/details/44788541

踩

(0)

赞

(0)

举报

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行

更多

友情链接

兰亭集智国之画百度统计站长统计阿里云 chrome插件新版天听网

关于我们 - 联系我们 - 留言反馈

© 2014 mamicode.com 版权所有联系我们:gaon5@hotmail.com

迷上了代码！