码迷,mamicode.com
首页 > 其他好文 > 详细

三、基于hadoop的nginx访问日志分析--计算时刻pv

时间:2016-12-28 12:20:12      阅读:127      评论:0      收藏:0      [点我收藏+]

标签:app   run   cal   step   self   pre   roo   values   nal   

代码:

# cat pv_hour.py 
#!/usr/bin/env python
# coding=utf-8

from mrjob.job import MRJob
from nginx_accesslog_parser import NginxLineParser

class PvDay(MRJob):

    nginx_line_parser = NginxLineParser()

    def mapper(self, _, line):

        self.nginx_line_parser.parse(line)
        _, tm = str(self.nginx_line_parser.time_local).split()
        h, m, s = tm.split(:)
        yield h, 1 # 每小时的

    def reducer(self, key, values):
        yield key, sum(values)

def main():
    PvDay.run()

if __name__ == __main__:
    main()

执行结果

# python3 pv_hour.py access_all.log-20161227 
No configs found; falling back on auto-configuration
Creating temp directory /tmp/pv_hour.root.20161228.025503.341576
Running step 1 of 1...
Streaming final output from /tmp/pv_hour.root.20161228.025503.341576/output...
"14"    21158
"15"    20958
"16"    16080
"17"    14194
"18"    13114
"19"    16898
"20"    18870
"21"    14067
"22"    14053
"23"    12683
"00"    13185
"01"    14785
"02"    12449
"03"    7364
"04"    3628
"05"    9074
"06"    9317
"07"    11887
"08"    13492
"09"    19564
"10"    18390
"11"    15697
"12"    17518
"13"    18785
Removing temp directory /tmp/pv_hour.root.20161228.025503.341576...

 

三、基于hadoop的nginx访问日志分析--计算时刻pv

标签:app   run   cal   step   self   pre   roo   values   nal   

原文地址:http://www.cnblogs.com/xiaoming279/p/6228622.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!