标签:ret ade pve order 内容 strong AC 匹配 split
一、分析目标
二、分析网页加载流程
三、分析目标数据的请求。分析参数、自己构造url
1. 找到网址:
https://rate.tmall.com/list_detail_rate.htmitemId=539137284584&spuId=701871908&sellerId=929347050&order=3¤tPage=1&append=0&content=1&tagId=&posi=&picture=&ua=098%23E1hvLpvWvRQvUvCkvvvvvjiPPFSptjlbPLsy6jYHPmP96jrWn2s9ljiEPFMyQjrURphvCvvvvvmCvpvW7D%2BnMq5w7Di4OzbNdphvHmQhsUE8o9v9BmeS8kH2mOcEmfwGiQhvCvvv9UUPvpvhvv2MMQhCvvOvUvvvphmivpvUvvmv%2BJZCZ94EvpvVmvvC9jxvKphv8vvvvvCvpvvvvvmmH6CvvHIvvUUdphvWvvvv9krvpv3Fvvmm86CvmVWEvpCWCh%2BMvvaw1WCl%2Bb8rwZHlYhzBRfpKofkXAf00Io3EAp0YyfUZEcqh1j7yHdUfbcc6D76fde%2BRfwLvaB46NZ59QnkQRqwiLO2vqU0QKLyCvvpvvvvv3QhvCvvhvvv%3D&isg=BBwcrmBIqyRNj10slC4flSrd7ToOPcHVm6szQvYdFofqQb3LHqQ2T4ezpam5SfgX&needFold=0&_ksTS=1527496615091_664&callback=jsonp665
2.分析
3.构造url,requests.get()的参数 pagram
4.写入库
代码粘贴如下:
1 # -*- coding:utf-8 -*-
2
3 # _ksTS=1526545121518_1881时间戳滞后了 ∴要动态的传参数——(导入time模块)
4 # callback=jsonp1882
5 import requests
6 import time
7 import random
8 import re
9 import json
10
11 url = ‘https://rate.tmall.com/list_detail_rate.htm?itemId=539137284584&spuId=701871908&sellerId=929347050&order=3&append=0&content=1&tagId=&posi=&picture=&ua=098%23E1hvsvvLvZIvUpCkvvvvvjiPPFdZ6jtPPLqOzjivPmPh1jDRRFchAjYbPsMh6jYWR46Cvvyv2vZjwchvJCurvpvEvvkUCgR2vV2LdphvmpvhOQb3gpCU4UhCvCLwMCHJGaMwznAY8xS50YAizRl4k46CvvyvCWgmYNZvECojvpvhvvpvvvGCvvpvvPMMuphvmvvv9bhrvjKCkphvC99vvOClpbyCvm9vvvvvphvvvvvv9F1vpvkjvvmmZhCv2CUvvUEpphvWwpvv9DCvpv11mphvLvp%2F6QvjWz7%2BkU97%2B3%2BraNBraB4AVAElYWmQrEt1pwLU%2BnezrmphQRAn3feAOHcIAXcBKFyK2ixrQj7Jymx%2F1j7QiXTAVArlMR29VEQCvpvVvvpvvhCvRphvCvvvvvm5vpvhvvmv9u6CvvyvCV4mRLyvVbervpvEvvBxvkgKv2kqRphvCvvvvvmCvpvZz2sm4VdNznswvCDfY0IwXaAv7Ihtvpvhvvvvvv%3D%3D&isg=BBgYp5ys9ga0jdox7XxaDMe26UbZGXLdB_e3zlII19NS7bvX-hKvGsuvISVdfTRj&needFold=0‘
12
13 # 发送 http://请求
14 # t = time.time() 时间戳time()函数
15 # csv文件 excle 可以打开
16
17 f = open(‘votes.csv‘,‘w‘,encoding=‘gbk‘)
18 f.write(‘评价内容,店家回复,昵称\n‘)
19 for i in range(99):
20 t = str(time.time()).split(‘.‘)
21
22 # 构造url的过程,get请求的参数
23 pagram = {
24 ‘currentPage‘: i+1,
25 ‘_ksTS‘: ‘%s_%s‘ % (t[0], t[1]),
26 ‘callback‘: ‘jsonp%s‘ %(int(t[1])+1)
27 }
28
29 # 随机休眠,行为分析,防止访问过快,避免被网站检测到有问题
30 time.sleep(random.random())
31
32 response = requests.get(url, params=pagram)
33 # 数据持久化——入库、文件
34 # csv文件:通过‘,‘区分
35 data = response.text
36
37 # 解析数据
38 data = re.findall(r‘{.*}‘, data)[0]
39 # json模块可以将 Json数据<——>为字典 互相转换
40
41 # Json数据——>转为字典
42 data = json.loads(data)
43 data = data[‘rateDetail‘][‘rateList‘]
44 print(data)
45 for item in data:
46 f.write(‘%s,%s,%s‘% (
47 item[‘rateContent‘].replace(‘,‘, ‘,‘),
48 item[‘reply‘].replace(‘,‘, ‘,‘),
49 item[‘displayUserNick‘]))
标签:ret ade pve order 内容 strong AC 匹配 split
原文地址:https://www.cnblogs.com/ruyingsuixing/p/9101051.html