码迷,mamicode.com
首页 > 编程语言 > 详细

python爬虫

时间:2019-05-20 17:00:24      阅读:123      评论:0      收藏:0      [点我收藏+]

标签:xpath   tag   gecko   tree   content   split   file   img   get   

import requests as app
from lxml import etree


header = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36"
}

def getImg(url,heade={}):
    resp = app.get(url,headers = heade)
    html = etree.HTML(resp.text)
    imgs = html.xpath(".//img/@data-original")
    for img in imgs:
        filename = img.split(/)[-1]
        fileimg = app.get(img,heade)
        with open(./img/{}.format(filename),wb) as f:
            f.write(fileimg.content)


for i in range(1,6):
    url = "http://www.win4000.com/meinvtag4_{}.html".format(i)
    getImg(url,heade=header)

 

python爬虫

标签:xpath   tag   gecko   tree   content   split   file   img   get   

原文地址:https://www.cnblogs.com/kjtt/p/10894917.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!