标签:
担心人人网会黄掉,写个爬虫,把我的相册照片都下载下来。代码如下:
# -*- coding: utf-8 -*- import requests import json import os def mkdir(path): path=path.strip() path=path.rstrip("\\") isExists=os.path.exists(path) if not isExists: print path+u‘ 创建成功‘ os.makedirs(path) else: print path+u‘ 目录已存在‘ #执行该文件的主过程 if __name__ == ‘__main__‘: #创建requests会话 s = requests.Session() #hyg初始url-登录hyg origin_url = ‘http://www.renren.com‘ login_data = { ‘email‘:‘账号‘, ‘domain‘:‘renren.com‘, ‘origURL‘:‘http://www.renren.com/home‘, ‘key_id‘:‘1‘, ‘captcha_type‘:‘web_login‘, ‘password‘:‘通过抓包获取‘, ‘rkey‘:‘通过抓包获取‘ } r = s.post("http://www.renren.com/ajaxLogin/login?1=1&uniqueTimestamp=2016742045262", data = login_data) if ‘true‘ in r.content: print u‘登录人人网成功‘ #访问相册 albums = [‘album-987919974‘] album_url = ‘http://photo.renren.com/photo/278382090/‘+albums[0]+‘/v7‘ r = s.get(album_url) if "photoId" in r.content: print u‘进入相册成功‘ #print r.content content = r.content index1 = content.find(‘nx.data.photo = ‘) print index1 index2 = content.find(‘; define.config‘) print index2 target_json = content[index1+16:index2].strip() target_json = target_json[13:len(target_json)-2] print target_json data = json.loads(target_json.replace("\‘", ‘"‘)); photos = data[‘photoList‘] album_name = data[‘albumName‘] # 定义并创建目录 album_path = ‘d:\\‘+album_name print album_path mkdir(album_path) for photo in photos: #print photo[‘url‘] image_name = photo[‘photoId‘] photo_url = photo[‘url‘] r = requests.get(photo_url) image_path = album_path+‘/‘+image_name+‘.jpg‘ f = open(image_path, ‘wb‘) f.write(r.content) f.close()
搞定!运行效果如下:
标签:
原文地址:http://www.cnblogs.com/LanTianYou/p/5785830.html