码迷,mamicode.com
首页 > 编程语言 > 详细

python爬虫 selenium 抓取 今日头条(ajax异步加载)

时间:2018-04-09 21:05:31      阅读:1316      评论:0      收藏:0      [点我收藏+]

标签:encoding   content   page   异步   pen   www.   str   coding   今日头条   

from selenium import webdriver
from lxml import etree
from pyquery import PyQuery as pq
import time

driver = webdriver.Chrome()
driver.maximize_window()
driver.get(https://www.toutiao.com/)
driver.implicitly_wait(10)
driver.find_element_by_link_text(科技).click()
driver.implicitly_wait(10)
for x in range(3):
    js="var q=document.documentElement.scrollTop="+str(x*500)
    driver.execute_script(js)
    time.sleep(2)

time.sleep(5)
page = driver.page_source
doc = pq(page)
doc = etree.HTML(str(doc))
contents = doc.xpath(//div[@class="wcommonFeed"]/ul/li)
print(contents)
for x in contents:
    title = x.xpath(div/div[1]/div/div[1]/a/text())
    if title:
        title = title[0]
        with open(toutiao.txt,a+,encoding=utf8)as f:
            f.write(title+\n)
        print(title)
    else:
        pass

 

python爬虫 selenium 抓取 今日头条(ajax异步加载)

标签:encoding   content   page   异步   pen   www.   str   coding   今日头条   

原文地址:https://www.cnblogs.com/hellangels333/p/8762112.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!