Python简单通用爬虫，爬取好看视频

时间：2021-03-17 14:20:20 阅读：0 评论：0 收藏：0 [点我收藏+]

标签：tab _id with open header 简单 loading content 开发环境 info

基本开发环境：

·Python3.6

·Pycharm

相关模块使用：

import requests
import time

目标网页分析：

选择一个影视栏目，F12或者鼠标右键检查，打开开发者工具，选择network，下滑网页

技术图片

https://haokan.baidu.com/videoui/api/videorec?tab=yingshi&act=pcFeed&pd=pc&num=20&shuaxin_id=1608879736528

相关的数据都在这个链接里面。

技术图片

play_url:就是视频url地址

title:就是视频标题

根据键值对取值的方式，提取视频地址以及标题即可。

技术图片

可以发现其实每个数据地址都是一样的，但是里面的内容不一样，如果想要多写爬取，写一个死循环即可。

实现效果：

技术图片

实现代码：

import pprint
import requests
import time

while True:
    url = ‘https://haokan.baidu.com/videoui/api/videorec‘
    now_time = int(time.time() * 1000)
    params = {
        ‘tab‘: ‘yingshi‘,
        ‘act‘: ‘pcFeed‘,
        ‘pd‘: ‘pc‘,
        ‘num‘: ‘20‘,
        ‘shuaxin_id‘: now_time,
    }
    headers = {
        ‘cookie‘: ‘添加自己的cookie‘,
        ‘user-agent‘: ‘Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36‘
    }

    response = requests.get(url=url, params=params, headers=headers)
    html_data = response.json()
    videos = html_data[‘data‘][‘response‘][‘videos‘]
    for i in videos:
        video_url = i[‘play_url‘]
        video_title = i[‘title‘]

        video_data = requests.get(url=video_url, headers=headers).content
        with open(‘video\\‘ + video_title + ‘.mp4‘, mode=‘wb‘) as f:
            f.write(video_data)
        print(‘正在保存：‘, video_title)

Python简单通用爬虫，爬取好看视频

标签：tab _id with open header 简单 loading content 开发环境 info

原文地址：https://www.cnblogs.com/Martinaoh/p/14541415.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行