Python知乎热门话题爬取

时间：2018-12-11 13:04:56 阅读：480 评论：0 收藏：0 [点我收藏+]

本例子是参考崔老师的Python3网络爬虫开发实战写的

技术分享图片

看网页界面：

热门话题都在 explore-feed feed-item的div里面

源码如下：

import requests
from pyquery import PyQuery as pq

url=‘https://www.zhihu.com/explore‘   #今日最热
#url=‘https://www.zhihu.com/explore#monthly-hot‘   #本月最热
headers={
    ‘User-Agent‘:"Mozilla/5.0 (Windows NT 6.0) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.36 Safari/536.5",
}
html=requests.get(url,headers=headers).text
doc=pq(html)
#print(doc)
items=doc(‘.explore-feed.feed-item‘).items()
for item in items:
    question=item.find(‘h2‘).text()
    #获取问题
    print(question)
    author=item.find(‘.author-link‘).text()
    #获取作者
    print(author)
    answer=pq(item.find(‘.content‘).html()).text()
    #获取答案（老师写的没看懂，可能需要jquery知识）
    print(answer)
    print(‘====‘*10)
    answer1=item.find(‘.zh-summary‘).text()
    #自己写的获取答案。。。
    print(answer1)

    #第一种写入方法
    file=open(‘知乎.txt‘,‘a‘,encoding=‘utf-8‘)
    file.write(‘\n‘.join([question,author,answer]))
    file.write(‘\n‘+‘****‘*50+‘\n‘)
    file.close()

    #第二种写入方法 不需要写关闭方法
    with open(‘知乎.txt‘,‘a‘,encoding=‘utf-8‘) as fp:
        fp.write(‘\n‘.join([question, author, answer]))
        fp.write(‘\n‘ + ‘****‘ * 50 + ‘\n‘)

运行结果如下：

技术分享图片

不过比较奇怪的地方是 url为今日最热和本月最热所爬取的结果一模一样。。而且都只能爬下五个div里面的东西，可能是因为知乎是动态界面。需要用到selenium吧

还有就是

answer=pq(item.find(‘.content‘).html()).text()
#获取答案（老师写的没看懂，可能需要jquery知识）

这行代码没有看懂。。。。

还得学习jQuery

Python知乎热门话题爬取

标签：热门 header 话题 use 答案 sts 图片 ems 一模一样

原文地址：https://www.cnblogs.com/yuxuanlian/p/10101244.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行