码迷,mamicode.com
首页 > 其他好文 > 详细

Pig —Multi-Query Execution

时间:2014-05-04 17:52:35      阅读:336      评论:0      收藏:0      [点我收藏+]

标签:style   blog   class   code   java   tar   

A = LOAD ‘/user/input/t.txt‘ as (k:chararray,c:int);
B = group A BY k;
C = foreach B generate group,SUM(A.c);

store C into ‘/user/output/test1.out‘;
DUMP C;
store C into ‘/user/output/test2.out‘;
A = LOAD ‘/user/input/t.txt‘ as (k:chararray,c:int);
B = group A BY k;
C = foreach B generate group,SUM(A.c);

store C into ‘/user/output/test1.out‘;

store C into ‘/user/output/test2.out‘;


With multi-query execution Pig processes an entire script or a batch of statements at once.Will create a batch Job to process the data

Turning it On or Off

Multi-query execution is turned on by default. To turn it off and revert to Pig‘s "execute-on-dump/store" behavior, use the "-M" or "-no_multiquery" options.

To run script "myscript.pig" without the optimization, execute Pig as follows:

$ pig -M myscript.pig
or
$ pig -no_multiquery myscript.pig

the first code will produce three mapred Job for: 

1.store C into ‘/user/output/test1.out‘

2.DUMP C

3.store C into ‘/user/output/test2.out‘ 

while the seconde code will only produce:one mapred Job

if we run the second code by: pig -no_multiquery test.pig it will also produce two Jobs

Store vs. Dump

With multi-query exection, you want to use STORE to save (persist) your results. You do not want to use DUMP as it will disable multi-query execution and is likely to slow down execution. (If you have included DUMP statements in your scripts for debugging purposes, you should remove them.)


Pig —Multi-Query Execution,布布扣,bubuko.com

Pig —Multi-Query Execution

标签:style   blog   class   code   java   tar   

原文地址:http://blog.csdn.net/xiewenbo/article/details/24978333

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!