码迷,mamicode.com
首页 > 其他好文 > 详细

通过selenium实现的京东商品爬取

时间:2019-04-28 22:01:40      阅读:492      评论:0      收藏:0      [点我收藏+]

标签:csv   通过   http   cep   handle   class   header   with   get   

from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as ec from lxml import etree import csv import requests,re,time #搜索的商品名称 shopname="Python设计模式" #声明浏览器对象 browser=webdriver.Chrome() browser.get("https://www.jd.com") #查找节点 inputtext = browser.find_element_by_class_name(‘text‘) #输入数据 inputtext.send_keys(shopname) #提交 btn = browser.find_element_by_class_name(‘button‘) btn.click() #搜索后的页面 #显式等待 wait = WebDriverWait(browser, 10) wait.until(ec.title_contains(shopname)) with open(shopname+".csv",‘a‘) as f: wr= csv.DictWriter(f,[‘name‘,‘price‘,‘shop‘]) wr.writeheader() while True: #判断是否为反爬虫机制窗体 是否正常 if len(browser.window_handles)>1: handles=browser.window_handles[1] browser.switch_to_window(handles) browser.close() # 滚动条 browser.execute_script("window.scrollTo(0, document.body.scrollHeight)") wait.until(ec.presence_of_element_located((By.CLASS_NAME, ‘pn-next‘))) # 爬取内容 html = etree.HTML(browser.page_source) # 读取每个商品 shops = html.xpath(‘//div[contains(@class,"gl-i-wrap")]‘) # 下一页 npage =html.xpath(‘//a[@class="pn-next disabled"]/em//text()‘) for shop in shops: name = shop.xpath(‘.//div[contains(@class,"p-name")]//em//text()‘) name = "".join(name) price = shop.xpath(‘.//div[contains(@class,"p-price")]//i//text()‘) price = "".join(price) sname = shop.xpath(‘.//div[contains(@class,"p-shop")]//a//@title‘) sname = "".join(sname) if sname.strip() == ‘‘: sname = "京东自营" wr.writerow({‘name‘:name,‘price‘:price,‘shop‘:sname}) if len(npage)>0: break try: pbtn = browser.find_element_by_class_name("pn-next") pbtn.click() except: pass browser.close()

通过selenium实现的京东商品爬取

标签:csv   通过   http   cep   handle   class   header   with   get   

原文地址:https://blog.51cto.com/12268222/2386302

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!