码迷,mamicode.com
首页 > 其他好文 > 详细

使用Pig计算出每个ip的点击次数

时间:2015-02-08 11:43:11      阅读:236      评论:0      收藏:0      [点我收藏+]

标签:

日志文件格式如下:
220.181.108.151 - - [31/Jan/2012:00:02:32 +0800] "GET /home.php?mod=space&uid=158&do=album&view=me&from=space HTTP/1.1" 200 8784 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
208.115.113.82 - - [31/Jan/2012:00:07:54 +0800] "GET /robots.txt HTTP/1.1" 200 582 "-" "Mozilla/5.0 (compatible; Ezooms/1.0; ezooms.bot@gmail.com)"
220.181.94.221 - - [31/Jan/2012:00:09:24 +0800] "GET /home.php?mod=spacecp&ac=pm&op=showmsg&handlekey=showmsg_3&touid=3&pmid=0&daterange=2&pid=398&tid=66 HTTP/1.1" 200 10070 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"
112.97.24.243 - - [31/Jan/2012:00:14:48 +0800] "GET /data/cache/style_2_common.css?AZH HTTP/1.1" 200 57752 "http://f.dataguru.cn/forum-58-1.html" "Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9A406"
一、Pig下载:
下载地址:http://www.apache.org/dyn/closer.cgi/pig

二、Pig安装:
解压
[grid@hadoop1 ~]$ tar -zxf pig-0.14.0.tar.gz

设置环境变量
[grid@hadoop1 ~]$ vi .bash_profile
PIG_INSTALL=/home/grid/pig-0.14.0
PIG_CLASSPATH=/home/grid/hadoop-1.2.1/conf/
PATH=$PATH:$PIG_INSTALL/bin
export PIG_INSTALL PATH PIG_CLASSPATH

设置JAVA_HOME
修改hosts文件

验证
[grid@hadoop1 ~]$ pig -help

连接到Hadoop集群
[grid@hadoop1 ~]$ pig
grunt> ls
hdfs://hadoop1:9000/user/grid/in    <dir>
hdfs://hadoop1:9000/user/grid/out    <dir>

三、开始作业
加载数据
grunt> A = LOAD ‘in/8/access_log.txt‘ USING PigStorage (‘ ‘) AS ( ip, page);
grunt> DESCRIBE A;
A: {ip: bytearray,page: bytearray}
去掉用不着的信息
grunt> B = FOREACH A GENERATE ip;
分组
grunt> C = GROUP B BY ip;
grunt> DESCRIBE C;
C: {group: bytearray,B: {(ip: bytearray)}}
统计
grunt> D = FOREACH C GENERATE group AS ip, COUNT(B) AS count;
查看结果
grunt> DUMP D;
(127.0.0.1,2)
(1.59.65.67,2)
(112.4.2.19,9)
(112.4.2.51,80)
(60.2.99.33,42)
(69.28.58.5,1)
(69.28.58.6,9)
(69.28.58.8,5)
(1.193.3.227,3)
(1.202.221.3,6)
(117.136.9.4,6)
(121.31.62.3,26)
(182.204.8.4,59)
(183.9.112.2,25)
(221.12.37.6,25)
(223.4.16.88,2)
(27.9.110.75,122)
技术分享


使用Pig计算出每个ip的点击次数

标签:

原文地址:http://my.oschina.net/zc741520/blog/376475

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!