pyppeteer爬虫例子

时间：2018-11-22 16:11:57 阅读：1106 评论：0 收藏：0 [点我收藏+]

标签：__name__ ons oop return col response task main import

import asyncio
import pyppeteer
from collections import namedtuple

Response = namedtuple("rs", "title url html cookies headers history status")


async def get_html(url, timeout=30):
    # 默认30s
    browser = await pyppeteer.launch(headless=True, args=[‘--no-sandbox‘])
    page = await  browser.newPage()
    res = await page.goto(url, options={‘timeout‘: int(timeout * 1000)})
    data = await page.content()
    title = await page.title()
    resp_cookies = await page.cookies()
    resp_headers = res.headers
    resp_history = None
    resp_status = res.status
    response = Response(title=title, url=url,
                        html=data,
                        cookies=resp_cookies,
                        headers=resp_headers,
                        history=resp_history,
                        status=resp_status)
    return response


if __name__ == ‘__main__‘:
    url_list = ["http://www.10086.cn/index/tj/index_220_220.html", "http://www.10010.com/net5/011/",
                "http://python.jobbole.com/87541/"]
    task = (get_html(url) for url in url_list)

    loop = asyncio.get_event_loop()
    results = loop.run_until_complete(asyncio.gather(*task))
    for res in results:
        print(res.title)

pyppeteer爬虫例子

标签：__name__ ons oop return col response task main import

原文地址：https://www.cnblogs.com/c-x-a/p/10001353.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行