python下载多个文件

时间：2014-12-09 17:07:10 阅读：168 评论：0 收藏：0 [点我收藏+]

# -*- coding: utf-8 -*-
__author__ = ‘Administrator‘
import urllib2,urllib,os,re
def Url1(url):#多个文件
    openr=urllib2.build_opener()#下载文件html代码，找出一楼的核心代码
    openr.add_handler=[(‘User-agent‘, ‘Mozilla/5.0‘)]#不加头信息则出现403错误和乱码
    html=openr.open(url).read()
    regfloor=‘<div class="msgfont">(.*?)</div>‘
    html1=re.search(regfloor,html)
    html=html1.group()
    return html.decode(‘utf-8‘)#文件保存编码和文件编辑编码都是utf-8，所以decode一次，不然会出现乱码，但是不影响结果。
def getimg(url):
    pagehtml=Url1(url)#从核心代码中照图图片地址，并且下载保存、命名
    reg=‘<img src="(.*?)" />‘#找到所有图片地址
    imag=re.findall(reg,pagehtml)
    dir=r‘G:\pic‘
    for index in xrange(len(imag)):
        pic=str(index+1)+‘.jpg‘
        fine=os.path.join(dir,pic)
        urllib.urlretrieve(imag[index],fine)
        print fine+‘ok‘

url=‘http://wangwei007.blog.51cto.com/68019/1351429‘
getimg(url)

python下载多个文件

标签：blog http ar os sp for on 文件 div

原文地址：http://www.cnblogs.com/mhxy13867806343/p/4153475.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行