码迷,mamicode.com
首页 > 编程语言 > 详细

Python 爬取知乎用户属性生成词语

时间:2018-01-08 21:06:13      阅读:284      评论:0      收藏:0      [点我收藏+]

标签:work   需要   ...   offset   badge   range   limit   ctc   auth   

 

代码如下:

# -*- coding:utf-8 -*-

import requests
import pandas as pd
import time

import matplotlib.pyplot as plt
from wordcloud import WordCloud
import jieba

header={
    ‘authorization‘:‘Bearer 2|1:0|10:1515395885|4:z_c0|92:Mi4xOFQ0UEF3QUFBQUFBRU1LMElhcTVDeVlBQUFCZ0FsVk5MV2xBV3dDLVZPdEhYeGxaclFVeERfMjZvd3lOXzYzd1FB|39008996817966440159b3a15b5f921f7a22b5125eb5a88b37f58f3f459ff7f8‘,
    ‘User-Agent‘:‘Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36‘,
    ‘X-UDID‘:‘ABDCtCGquQuPTtEPSOg35iwD-FA20zJg2ps=‘,
}

user_data = []
def get_user_data(page):
    for i in range(page):
        url = ‘https://www.zhihu.com/api/v4/members/excited-vczh/followees?include=data%5B*%5D.answer_count%2Carticles_count%2Cgender%2Cfollower_count%2Cis_followed%2Cis_following%2Cbadge%5B%3F(type%3Dbest_answerer)%5D.topics&offset={}&limit=20‘.format(i*20)
#response = requests.get(url, headers=header).text
        response = requests.get(url, headers=header).json()[‘data‘]#[‘data‘]   只有JSON格式中选择data节点
        user_data.extend(response)
        print(‘正在爬取%s页‘ % str(i+1))
        time.sleep(1)

if __name__==‘__main__‘:
    get_user_data(10)
    #pandas 的函数 from_dict()可以直接将一个response变成一个对象
    #df = pd.DataFrame.from_dict(user_data)
    #df.to_csv(‘D:/PythonWorkSpace/TestData/zhihu/user2.csv‘)
    df = pd.DataFrame.from_dict(user_data).get(‘headline‘)
    df.to_csv(‘D:/PythonWorkSpace/TestData/zhihu/headline.txt‘)

    text_from_file_with_apath = open(‘D:/PythonWorkSpace/TestData/zhihu/headline.txt‘).read()
    wordlist_after_jieba = jieba.cut(text_from_file_with_apath, cut_all=True)
    wl_space_split = " ".join(wordlist_after_jieba)

    my_wordcloud = WordCloud().generate(wl_space_split)

    plt.imshow(my_wordcloud)
    plt.axis("off")
    plt.show()

  

需要安装准备的库:

pip install matplotlib
pip install jieba
pip install wordcloud(发现这方法安装不成功)

技术分享图片

换种安装方式到 https://github.com/amueller/word_cloud 这里下载库文件,解压,然后进入到解压后的文件,按住shift+鼠标右键 打开命令窗口运行一下命令:

python setup.py install

 然后同样报错
技术分享图片

 晚上继续解决... ....

 

Python 爬取知乎用户属性生成词语

标签:work   需要   ...   offset   badge   range   limit   ctc   auth   

原文地址:https://www.cnblogs.com/PeterZhang1520389703/p/8244633.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!