码迷,mamicode.com
首页 > 其他好文 > 详细

Pig join用法举例

时间:2015-06-07 23:11:01      阅读:712      评论:0      收藏:0      [点我收藏+]

标签:

jnd = join a by f1, b by f2;

 
join操作默认的是内连接,只有两边都匹配才会保留
 
需要用null补位的那边需要知道它的模式:
如果是左外连接,需要知道右边的数据集的模式,不匹配的字段用null补位
如果是右外连接,需要知道左边的数据集的模式,不匹配的字段用null补位
如果是全外连接,需要知道两边的数据集的模式,不匹配的字段用null补位
 
触发reduce阶段
 
基本用法
a = load ‘input1‘;
b = load ‘input2‘;
jnd = join a by $0, b by $1;

   

多字段连接
a = load ‘input1‘ as (username, age, city);
b = load ‘input2‘ as (orderid, user, city);
jnd = join a by (username, city), b by (user, city);

   

:: join后的字段引用
a = load ‘input1‘ as (username, age, address);
b = load ‘input2‘ as (orderid, user, money;
jnd = join a by username, b by user;
result = foreach jnd generate a::username, a::age, address, b::orderid;

   

多数据集连接
a = load ‘input1‘ as (username, age);
b = load ‘input2‘ as (orderid, user);
c = load ‘input3‘ as (user, acount);
jnd = join a by username, b by user, c by user;

   

外连接 仅限两个数据集
a = load ‘input1‘ as (username, age);
b = load ‘input2‘ as (orderid, user);
jnd = join a by username left outer, b by user;
jnd = join a by username right, b by user;
jnd = join a by username full, b by user;

  

自连接 需要加载自身数据集两次,使用不同的别名
a = load ‘data‘ as (node, parentid, name);
b = load ‘data‘ as (node, parentid, name);
jnd = join a by node, b by parentid;

  

 
 

Pig join用法举例

标签:

原文地址:http://www.cnblogs.com/lishouguang/p/4559602.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!