SpiderKeeper 添加 Stats 链接过滤 log 最新信息

时间：2018-06-09 19:50:08 阅读：385 评论：0 收藏：0 [点我收藏+]

标签：ecs find git crawl 表格 pid http span 添加

0.参考

https://github.com/DormyMo/SpiderKeeper

1.Job Dashboard 页面添加 Stats 链接

python3.6/site-packages/SpiderKeeper/app/templates/job_dashboard.html

搜索 /log 定位

1.1 添加 Stats 表格列

技术分享图片

1.2 添加 Stats 链接

技术分享图片

2. 页面效果

技术分享图片

3. 过滤最新 log 信息

python3.6/site-packages/SpiderKeeper/app/spider/controller.py

本质上是通过 requests 请求 scrapyd 的 log 页面，再重新排版，注意 escape

搜索 /log 定位

技术分享图片

3.1 添加 python 代码

import re
from html import escape
@app.route("/project/<project_id>/jobexecs/<job_exec_id>/stats")
def job_stats(project_id, job_exec_id):
    job_execution = JobExecution.query.filter_by(project_id=project_id, id=job_exec_id).first()
    res = requests.get(agent.log_url(job_execution))
    res.encoding = ‘utf-8‘
    #return res.text
    m = re.findall(r‘\n.*?Redirecting\s+\(.*‘, res.text)
    a = m[-1] if m else ‘‘
    m = re.findall(r‘\n.*?Crawled\s+\(.*‘, res.text)
    b = m[-1] if m else ‘‘
    m = re.findall(r‘\n.*?Scraped\s+from\s+<.*‘, res.text)
    c = m[-1] if m else ‘‘
    m = re.findall(r‘\n.*?{.*?}‘, res.text)
    d = m[-1] if m else ‘‘
    m = re.findall(r‘\n.*?Crawled\s+\d+\s+pages\s+\(.*‘, res.text)
    e = m[-1] if m else ‘‘
    return ‘<br>‘.join([escape(i) for i in [a, b, c, d, e]])

4. log 过滤结果

技术分享图片

SpiderKeeper 添加 Stats 链接过滤 log 最新信息

标签：ecs find git crawl 表格 pid http span 添加

原文地址：https://www.cnblogs.com/my8100/p/SpiderKeeper_stats.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行