pig使用入门2

时间：2014-06-19 00:27:03 阅读：354 评论：0 收藏：0 [点我收藏+]

标签：使用文件数据类表 for

1、练习

　　首先编写两个数据文件A：

　　0,1,2

　　1,3,4

　　数据文件B：

　　0,5,2

　　1,7,8

2、运行pig ,加载A和B

　　加载数据A：使用指定的分隔符,

　 grunt> a = load ‘/input/A‘ using PigStorage(‘,‘) as (a1:int, a2:int, a3:int);

　　加载数据B：

　　grunt> b = load ‘/input/B‘ using PigStorage(‘,‘) as (b1:int, b2:int, b3:int);

3、求a,b的并集

　　grunt> c = union a, b;

　　grunt> dump c;

　　(0,5,2)

　　(1,7,8)

　　(0,1,2)

　　(1,3,4)

4、

　　将c分割为d和e，其中d的第一列数据值为0，e的第一列的数据为1（$0表示数据集的第一列）：

　　grunt> split c into d if $0 == 0, e if $0 == 1;

　　查看d：

　　grunt> dump d;

　　　　(0,1,2)

　　　　(0,5,2)

　　查看e：

　　　　(1,3,4)

　　　　(1,7,8)

5、选择C中的一部分数据（第二列值大于3的记录）

　　grunt> f = filter c by $1 > 3;

　　查看数据f：

　　grunt> dump f;

　　　　(0,5,2)

　　　　(1,7,8)

6、对数据进行分组

　　grunt> g = group c by $2;

　　查看g:

　　grunt> dump g;

　　　(2,{(0,1,2),(0,5,2)})

　　　　(4,{(1,3,4)})

　　　　(8,{(1,7,8)})

7、将所有的元素集合到一起

　　grunt> h = group c all;

　　grunt> dump h;

　　　　(all,{(0,1,2),(1,3,4),(0,5,2),(1,7,8)})

8、join操作

　　grunt> j = join a by $2, b by $2;

　　该操作类似于sql中的连表查询，这是的条件是$2 == $2。

　　取出c的第二列$1和$1 * $2，将这两列保存在k中

9、对列进行计算

　　grunt> k = foreach c generate $1, $1 * $2;

　　查看k的内容：

　　grunt> dump k;

　　　　(5,10)

　　　　(7,56)

　　　　(1,2)

　　　　(3,12)

pig使用入门2,布布扣,bubuko.com

pig使用入门2

标签：使用文件数据类表 for

原文地址：http://www.cnblogs.com/jsunday/p/3789642.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行