码迷,mamicode.com
首页 > 其他好文 > 详细

使用mitmproxy做今日头条爬虫链接分析

时间:2019-01-25 17:48:02      阅读:488      评论:0      收藏:0      [点我收藏+]

标签:utf-8   port   dal   The   style   nss   search   war   chardet   

import pickle

import chardet
from mitmproxy import ctx
from pprint import pprint

heads_file = header.txt

body_file = body.txt

#mitmdump -s test.py
# Dalvik/2.1.0 (Linux; U; Android 8.1.0; MI 8 MIUI/8.8.31)
def request(flow):
     #只是修改请求浏览器请求头为MitmProxy
     # flow.request.headers[‘User-Agent‘] = ‘Mozilla/5.0 (Linux; U; Android 6.0.1; zh-cn; MI 5s Build/MXB48T) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/53.0.2785.146 Mobile Safari/537.36 XiaoMi/MiuiBrowser/8.7.1‘
     # ctx.log.warn(str(flow.request.url))
     # ctx.log.info(str(flow.request.headers))
     # pprint(vars(flow.request))
     # ctx.log.error(str(dir(flow.request)))
     # ctx.log.info("data.content:" + str(flow.request.data.content))
     # ctx.log.info("data:" + str(dir(flow.request.data)))
     # ctx.log.info("content:" + str(flow.request.content))
     # ctx.log.info(flow.request.headers[‘User-Agent‘])
     url = str(flow.request.url)
     ctx.log.info("url:" + url)
     # if ‘pstatp.com/article‘ in url or ‘snssdk.com/article‘ in url or ‘snssdk.com/api/search‘ in url:
     #      file = open(heads_file, encoding="utf-8", mode="a")
     #      file.write( url + "\r")
     #      file.close()
     fileother = open("other.txt", encoding="utf-8", mode="a")
     fileother.write(url + "\r")
     fileother.close()
     # with open(heads_file, ‘a‘) as handle:
     #      pickle.dump(flow.request.url, handle)


# def response(flow):
#      response = flow.response
#      info = ctx.log.info
#      info(str(response.status_code))
#      info(str(response.headers))
#      info(str(response.cookies))
#      # info(str(response.encoding))
#      detRes = chardet.detect(response.content)  # 返回编码结果
#      charset = detRes["encoding"]
#      info(str(charset))
#      # text = response.content.decode(charset, "ignore")
#      if not charset:
#           charset = ‘utf-8‘
#      text = str(response.content,encoding=charset)
#      info(text)
#      file = open(body_file,encoding=charset,mode="a")
#      file.write(text)
#      file.close()
     # with open(body_file, ‘a‘) as handle:
     #      pickle.dump(text, handle)

 

使用mitmproxy做今日头条爬虫链接分析

标签:utf-8   port   dal   The   style   nss   search   war   chardet   

原文地址:https://www.cnblogs.com/procedureMonkey/p/10320322.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!