python爬虫——中华网图片库下载

时间：2015-12-03 02:07:52 阅读：251 评论：0 收藏：0 [点我收藏+]

标签：

# -*- coding: utf-8 -*-
import requests
import re
import sys
reload(sys)
sys.setdefaultencoding(‘utf-8‘)

if __name__ == ‘__main__‘:
    url = ‘http://photostock.china.com.cn/Web_CHN/SpecialTopicPhoto.aspx?Id=296‘
    html = requests.get(url)
    img_src = re.findall(‘<img alt=.*?src="..(.*?)".*?/>‘, html.text, re.S)
    imgUrl = []
    for each_src in img_src:
        imgUrl.append("http://photostock.china.com.cn" + each_src)
    picName = 100
    for each in imgUrl:
        imgContext = requests.get(each).content
        with open("lovelyAnimals/" + str(picName) + ".jpg", "wb") as code:
            code.write(imgContext)
        picName += 1

‘‘‘
下载文件的3种方法
(1): 使用urllib.urlretrieve方法，可在callbackfunc函数中显示下载进度
def callbackfunc(blocknum, blocksize, totalsize):
    # 回调函数
    # @blocknum:
    #     已经下载的数据块

    # @blocksize:
    #     数据块的大小

    # @totalsize:
    #     远程文件的大小
    percent = 100.0 * blocknum * blocksize / totalsize
    if percent > 100:
        percent = 100
    print "%.2f%%"% percent
url = ‘http://www.sina.com.cn‘
local = ‘lovelyAnimals/sina.html‘
urllib.urlretrieve(url, local, callbackfunc)

(2):使用urllib2.urlopen
import urllib2
url = ‘http://www.sina.com.cn‘
f = urllib2.urlopen(url)
data = f.read()
with open("lovelyAnimals/sina.html", "wb") as code:
    code.write(data)

(3):使用requests模块
import requests
url = ‘http://www.sina.com.cn‘
html = requests.get(url)
with open("lovelyAnimals/sina.html", "wb") as code:
    code.write(html.content)
‘‘‘

python爬虫——中华网图片库下载

标签：

原文地址：http://www.cnblogs.com/ponpon7/p/5014843.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行