使用selenium实现站长素材图片采集

时间：2019-05-06 18:56:21 阅读：118 评论：0 收藏：0 [点我收藏+]

标签：eve 素材 port gpu request htm ons body 名称

from selenium import webdriver
import requests,os
from lxml import etree
from selenium.webdriver.chrome.options import Options
from urllib import request

chrome_options = Options()
chrome_options.add_argument(‘--headless‘)
chrome_options.add_argument(‘--disable-gpu‘)

pro = webdriver.Chrome(executable_path=‘./chromedriver.exe‘,options=chrome_options)

url = "http://sc.chinaz.com/tupian/haiyangshengwutupian.html"
pro.get(url)
js = ‘window.scrollTo(0,document.body.scrollHeight)‘
pro.execute_script(js)
page_text = pro.page_source

tree = etree.HTML(page_text)
url_img = tree.xpath(‘//div[@id="container"]/div[@class="box picblock col3 masonry-brick"]/div/a/img/@src‘)  #获取图片url列表
names = tree.xpath(‘//div[@id="container"]/div[@class="box picblock col3 masonry-brick"]/div/a/@alt‘)   #图片名称列表

if not os.path.exists(‘./img‘):  #生成文件夹
    os.mkdir(‘./img‘)    
for index,url in enumerate(url_img):
    img_path = ‘./img/‘ + names[index]+‘.jpg‘  #提取图片名称
    request.urlretrieve(url,img_path)

标签：eve 素材 port gpu request htm ons body 名称

原文地址：https://www.cnblogs.com/wangtaobiu/p/10821077.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行