标签:exception 来源 rod pycharm pen sha imp sea tty
最近因为一些事情,需要爬一下京东商品的评论(大部分是书籍)
话不多说赶紧上代码:
# -*- coding: utf-8 -*-
import re, json, requests
import codecs
from bs4 import BeautifulSoup
import csv
import os
s = requests.session()
url = ‘https://club.jd.com/comment/productPageComments.action‘
data = {
‘callback‘: ‘fetchJSON_comment98vv13933‘,
# 需要抓取评论的商品id
‘productId‘: ‘11936238‘,
# score 参数说明:
# 0 抓取所有评论(好评在前)
# 1 抓取所有差评
# 2 抓取所有中评
# 3 抓取所有追评
# 4 抓取所有配图评论
‘score‘: 1,
‘sortType‘: 5,
‘page‘: 0,
‘pageSize‘: 10,
‘isShadowSku‘: 0,
‘fold‘: 1
}
# 设置抓取目标评论数
target_cnt = 100
# 设置保存文件名
target_file = str(data[‘productId‘]) + ‘_‘ + str(data[‘score‘]) + ‘.csv‘
cnt = 1
with open(target_file, "w", encoding=‘utf8‘, newline=‘‘) as csvFile:
writer = csv.writer(csvFile, quoting=csv.QUOTE_ALL)
writer.writerow(["stars", "time", "comment"])
while cnt <= target_cnt:
t = s.get(url, params=data).text
try:
t = re.search(r‘(?<=fetchJSON_comment98vv13933\().*(?=\);)‘, t).group(0)
except Exception as e:
break
j = json.loads(t)
commentSummary = j[‘comments‘]
for comment in commentSummary:
c_content = comment[‘content‘] # 评论
c_time = comment[‘referenceTime‘]
c_name = comment[‘nickname‘]
c_client = comment[‘userClientShow‘]
score = comment[‘score‘]
print(score)
print(‘{} {} {}\n{}\n‘.format(c_name, c_time, c_client, c_content))
writer.writerow([score, c_time, c_content])
data[‘page‘] += 1
cnt = cnt + 1
csvFile.close()
大概没什么其他需要讲的了吧,当然这个爬虫是在别的地方找的。而且也是最简单的一类,没有做反反爬处理。这些以后会找机会记录。
来源:Github
标签:exception 来源 rod pycharm pen sha imp sea tty
原文地址:https://www.cnblogs.com/georgeyang/p/9077118.html