码迷,mamicode.com
首页 > 数据库 > 详细

hive on tez sql 优化

时间:2015-05-06 18:22:57      阅读:458      评论:0      收藏:0      [点我收藏+]

标签:hive tez sql 优化

select t.id end_usr_id,

       t.end_user_name end_usr_name,

       t.end_user_real_name end_usr_nknm,

       t.end_user_password end_usr_pswd,

       t.end_user_birthday end_usr_bthd,

       t.mobile mobl_num,

       t.phone phone_num,

       t.valid_mobile_phone_num bindg_mobl_num,

       t.end_user_email end_usr_email,

       substr(t.end_user_email,

              instr(t.end_user_email, ‘@‘),

              instr(t.end_user_email, ‘.‘) - instr(t.end_user_email, ‘@‘)) email_class,

       t.end_user_create_time creat_time,

       t.end_user_last_login_date end_usr_last_login_date,

       t.end_user_last_bought_date end_usr_last_buy_date,

       t.end_user_login_times end_usr_login_times,

       t.end_user_bought_times end_usr_buy_times,

       t.end_user_bought_amount end_usr_buy_amt,

       t.end_user_type end_usr_type,

       t.ip ip,

       t.end_user_points end_usr_intgl,

       t.id_card id_card,

       t.end_user_sex sex,

       t.mc_site_id mc_site_id,

       case

         when t.is_email_activate = 1 then

          1

         else

          0

       end as email_verfd_flg,

       case

         when t.valid_mobile_phone_num is not null then

          1

         else

          0

       end as mobl_bding_flg,

       case

         when t.end_user_name like ‘%@sina%‘ then

          1

         when t.end_user_name like ‘%@pingan%‘ then

          2

         when t.end_user_name like ‘%@alipay%‘ and

              not (t2.user_grade in (1, 2) and t2.user_grade_type = 1) then

          3

         when t.end_user_name like ‘%@alipay%‘ and t2.user_grade in (1, 2) and

              t2.user_grade_type = 1 then

          4

         when t.end_user_name like ‘%@163%‘ then

          5

         when t.end_user_name like ‘%@kaixin001%‘ then

          6

         when t.end_user_name like ‘%@139%‘ then

          7

         when t.end_user_name like ‘%@msn.com%‘ then

          8

         when t.end_user_name like ‘%@anyue%‘ then

          9

         when t.end_user_name like ‘%@qq%‘ then

          10

         else

          0

       end as commn_login_type,

       case

         when t.end_user_name like ‘%@b2b%‘ or t.end_user_name = ‘yuxiaolan‘ then

          1

         else

          0

       end as b2b_flg,

       case

         when t3.send_to is not null then

          1

         else

          0

       end as email_ret_flg,

       case

         when t4.email is not null then

          1

         else

          0

       end as email_blacklist_flg,

       case

         when t5.car = ‘1‘ then

          1

         else

          0

       end as car_flg,

       t5.shopping_habit shppg_prefr,

       t6.mail_level email_lvl,

       ‘‘ whsl_flg,

       ‘‘ xh_flg,

       ‘‘ exptn_flg,

       ‘‘ b2b_vip_flg,

       t.member_grade end_usr_grd_id,

       case

         when t.co_code is null then

          null

         when t.co_code is not null and t7.usr_union_logon_id is null then

          -999999

         else

          t7.usr_union_logon_id

       end usr_union_logon_id,

       t.co_code usr_union_logon_code,

       t8.exp end_usr_grow_val

  from tandem.end_user t

  left outer join tandem.alipay_user t2

    on t.id = t2.end_user_id

  left outer join (select distinct lower(send_to) as send_to

               from tandem.yhd_send_message_blacklist) t3

    on lower(t.end_user_email) = t3.send_to

  left outer join (select distinct lower(email) as email

               from tandem.edm_error_email) t4

    on lower(t.end_user_email) = t4.email

  left outer join tmp_end_usr_info t5

    on t.id = t5.end_usr_id

  left outer join tandem.crm_user_quality t6

    on t.end_user_email = t6.email

   and t6.user_id is not null

  left outer join dw.usr_union_logon t7

    on t.co_code = t7.usr_union_logon_code

   and t7.usr_union_logon_code is not null

  left outer join tandem.member t8

    on t.id = t8.end_user_id

   and to_date(t8.update_time) = ‘2015-05-04‘

 where t.update_time >= ‘2015-05-04‘

    or length(t8.end_user_id)>0;


运行速度慢

执行计划如下:

STAGE DEPENDENCIES:

  Stage-1 is a root stage

  Stage-0 depends on stages: Stage-1


STAGE PLANS:

  Stage: Stage-1

    Tez

      Edges:

        Reducer 2 <- Map 1 (SIMPLE_EDGE)

        Reducer 3 <- Reducer 2 (SIMPLE_EDGE), Reducer 6 (SIMPLE_EDGE), Reducer 8 (SIMPLE_EDGE)

        Reducer 4 <- Map 11 (SIMPLE_EDGE), Map 12 (BROADCAST_EDGE), Reducer 3 (SIMPLE_EDGE)

        Reducer 6 <- Map 5 (SIMPLE_EDGE)

        Reducer 8 <- Map 10 (SIMPLE_EDGE), Map 13 (SIMPLE_EDGE), Map 7 (SIMPLE_EDGE), Map 9 (SIMPLE_EDGE)

      DagName: zhoushugang_20150505143131_1821ced5-3723-4094-b6a3-c2aae136cd32:2

      Vertices:

        Map 1 

            Map Operator Tree:

                TableScan

                  alias: yhd_send_message_blacklist

                  Statistics: Num rows: 870860 Data size: 46859629 Basic stats: COMPLETE Column stats: NONE

                  Select Operator

                    expressions: send_to (type: string)

                    outputColumnNames: send_to

                    Statistics: Num rows: 870860 Data size: 46859629 Basic stats: COMPLETE Column stats: NONE

                    Group By Operator

                      keys: lower(send_to) (type: string)

                      mode: hash

                      outputColumnNames: _col0

                      Statistics: Num rows: 870860 Data size: 46859629 Basic stats: COMPLETE Column stats: NONE

                      Reduce Output Operator

                        key expressions: _col0 (type: string)

                        sort order: +

                        Map-reduce partition columns: _col0 (type: string)

                        Statistics: Num rows: 870860 Data size: 46859629 Basic stats: COMPLETE Column stats: NONE

        Map 10 

            Map Operator Tree:

                TableScan

                  alias: t5

                  Statistics: Num rows: 4414357 Data size: 67807378 Basic stats: COMPLETE Column stats: NONE

                  Reduce Output Operator

                    key expressions: end_usr_id (type: bigint)

                    sort order: +

                    Map-reduce partition columns: end_usr_id (type: bigint)

                    Statistics: Num rows: 4414357 Data size: 67807378 Basic stats: COMPLETE Column stats: NONE

                    value expressions: car (type: string), shopping_habit (type: string)

        Map 11 

            Map Operator Tree:

                TableScan

                  alias: t6

                  Statistics: Num rows: 38930100 Data size: 4515891712 Basic stats: COMPLETE Column stats: NONE

                  Filter Operator

                    predicate: user_id is not null (type: boolean)

                    Statistics: Num rows: 19465050 Data size: 2257945856 Basic stats: COMPLETE Column stats: NONE

                    Reduce Output Operator

                      key expressions: email (type: string)

                      sort order: +

                      Map-reduce partition columns: email (type: string)

                      Statistics: Num rows: 19465050 Data size: 2257945856 Basic stats: COMPLETE Column stats: NONE

                      value expressions: mail_level (type: double)

        Map 12 

            Map Operator Tree:

                TableScan

                  alias: t7

                  Statistics: Num rows: 16 Data size: 1752 Basic stats: COMPLETE Column stats: NONE

                  Filter Operator

                    predicate: usr_union_logon_code is not null (type: boolean)

                    Statistics: Num rows: 8 Data size: 876 Basic stats: COMPLETE Column stats: NONE

                    Reduce Output Operator

                      key expressions: usr_union_logon_code (type: string)

                      sort order: +

                      Map-reduce partition columns: usr_union_logon_code (type: string)

                      Statistics: Num rows: 8 Data size: 876 Basic stats: COMPLETE Column stats: NONE

                      value expressions: usr_union_logon_id (type: bigint)

        Map 13 

            Map Operator Tree:

                TableScan

                  alias: t8

                  Statistics: Num rows: 25660729 Data size: 3178607508 Basic stats: COMPLETE Column stats: NONE

                  Filter Operator

                    predicate: (to_date(update_time) = ‘2015-05-04‘) (type: boolean)

                    Statistics: Num rows: 12830364 Data size: 1589303692 Basic stats: COMPLETE Column stats: NONE

                    Reduce Output Operator

                      key expressions: end_user_id (type: bigint)

                      sort order: +

                      Map-reduce partition columns: end_user_id (type: bigint)

                      Statistics: Num rows: 12830364 Data size: 1589303692 Basic stats: COMPLETE Column stats: NONE

                      value expressions: exp (type: int)

        Map 5 

            Map Operator Tree:

                TableScan

                  alias: edm_error_email

                  Statistics: Num rows: 5137907 Data size: 513790784 Basic stats: COMPLETE Column stats: NONE

                  Select Operator

                    expressions: email (type: string)

                    outputColumnNames: email

                    Statistics: Num rows: 5137907 Data size: 513790784 Basic stats: COMPLETE Column stats: NONE

                    Group By Operator

                      keys: lower(email) (type: string)

                      mode: hash

                      outputColumnNames: _col0

                      Statistics: Num rows: 5137907 Data size: 513790784 Basic stats: COMPLETE Column stats: NONE

                      Reduce Output Operator

                        key expressions: _col0 (type: string)

                        sort order: +

                        Map-reduce partition columns: _col0 (type: string)

                        Statistics: Num rows: 5137907 Data size: 513790784 Basic stats: COMPLETE Column stats: NONE

        Map 7 

            Map Operator Tree:

                TableScan

                  alias: t

                  Statistics: Num rows: 108659208 Data size: 49700768672 Basic stats: COMPLETE Column stats: NONE

                  Reduce Output Operator

                    key expressions: id (type: bigint)

                    sort order: +

                    Map-reduce partition columns: id (type: bigint)

                    Statistics: Num rows: 108659208 Data size: 49700768672 Basic stats: COMPLETE Column stats: NONE

                    value expressions: end_user_name (type: string), end_user_password (type: string), end_user_real_name (type: string), end_user_birthday (type: string), end_user_last_login_date (type: string), end_user_last_bought_date (type: string), end_user_login_times (type: double), end_user_bought_amount (type: double), end_user_bought_times (type: double), end_user_sex (type: double), end_user_create_time (type: string), end_user_type (type: double), ip (type: string), end_user_points (type: double), co_code (type: string), is_email_activate (type: double), mc_site_id (type: bigint), update_time (type: string), member_grade (type: double), end_user_email (type: string), mobile (type: string), phone (type: string), valid_mobile_phone_num (type: int), id_card (type: string)

        Map 9 

            Map Operator Tree:

                TableScan

                  alias: t2

                  Statistics: Num rows: 11871989 Data size: 422420127 Basic stats: COMPLETE Column stats: NONE

                  Reduce Output Operator

                    key expressions: end_user_id (type: bigint)

                    sort order: +

                    Map-reduce partition columns: end_user_id (type: bigint)

                    Statistics: Num rows: 11871989 Data size: 422420127 Basic stats: COMPLETE Column stats: NONE

                    value expressions: user_grade (type: double), user_grade_type (type: double)

        Reducer 2 

            Reduce Operator Tree:

              Group By Operator

                keys: KEY._col0 (type: string)

                mode: mergepartial

                outputColumnNames: _col0

                Statistics: Num rows: 435430 Data size: 23429814 Basic stats: COMPLETE Column stats: NONE

                Reduce Output Operator

                  key expressions: _col0 (type: string)

                  sort order: +

                  Map-reduce partition columns: _col0 (type: string)

                  Statistics: Num rows: 435430 Data size: 23429814 Basic stats: COMPLETE Column stats: NONE

        Reducer 3 

            Reduce Operator Tree:

              Merge Join Operator

                condition map:

                     Left Outer Join0 to 1

                     Left Outer Join0 to 2

                keys:

                  0 lower(_col107) (type: string)

                  1 _col0 (type: string)

                  2 _col0 (type: string)

                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col22, _col24, _col30, _col31, _col32, _col34, _col44, _col67, _col68, _col69, _col72, _col78, _col83, _col84, _col103, _col107, _col108, _col109, _col110, _col111, _col117, _col118, _col125, _col126, _col131, _col133, _col145, _col146

                Statistics: Num rows: 788865883 Data size: 360827596199 Basic stats: COMPLETE Column stats: NONE

                Reduce Output Operator

                  key expressions: _col107 (type: string)

                  sort order: +

                  Map-reduce partition columns: _col107 (type: string)

                  Statistics: Num rows: 788865883 Data size: 360827596199 Basic stats: COMPLETE Column stats: NONE

                  value expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col22 (type: string), _col24 (type: string), _col30 (type: double), _col31 (type: double), _col32 (type: double), _col34 (type: double), _col44 (type: string), _col67 (type: double), _col68 (type: string), _col69 (type: double), _col72 (type: string), _col78 (type: double), _col83 (type: bigint), _col84 (type: string), _col103 (type: double), _col108 (type: string), _col109 (type: string), _col110 (type: int), _col111 (type: string), _col117 (type: double), _col118 (type: double), _col125 (type: string), _col126 (type: string), _col131 (type: bigint), _col133 (type: int), _col145 (type: string), _col146 (type: string)

        Reducer 4 

            Reduce Operator Tree:

              Merge Join Operator

                condition map:

                     Left Outer Join0 to 1

                keys:

                  0 _col107 (type: string)

                  1 email (type: string)

                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col22, _col24, _col30, _col31, _col32, _col34, _col44, _col67, _col68, _col69, _col72, _col78, _col83, _col84, _col103, _col107, _col108, _col109, _col110, _col111, _col117, _col118, _col125, _col126, _col131, _col133, _col145, _col146, _col157

                Statistics: Num rows: 867752490 Data size: 396910364421 Basic stats: COMPLETE Column stats: NONE

                Map Join Operator

                  condition map:

                       Left Outer Join0 to 1

                  keys:

                    0 _col72 (type: string)

                    1 usr_union_logon_code (type: string)

                  outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col22, _col24, _col30, _col31, _col32, _col34, _col44, _col67, _col68, _col69, _col72, _col78, _col83, _col84, _col103, _col107, _col108, _col109, _col110, _col111, _col117, _col118, _col125, _col126, _col131, _col133, _col145, _col146, _col157, _col163

                  input vertices:

                    1 Map 12

                  Statistics: Num rows: 954527759 Data size: 436601410326 Basic stats: COMPLETE Column stats: NONE

                  Filter Operator

                    predicate: ((_col84 >= ‘2015-05-04‘) or (length(_col131) > 0)) (type: boolean)

                    Statistics: Num rows: 636351838 Data size: 291067606274 Basic stats: COMPLETE Column stats: NONE

                    Select Operator

                      expressions: _col0 (type: bigint), _col1 (type: string), _col3 (type: string), _col2 (type: string), _col4 (type: string), _col108 (type: string), _col109 (type: string), _col110 (type: int), _col107 (type: string), substr(_col107, instr(_col107‘@‘), (instr(_col107‘.‘) - instr(_col107‘@‘))) (type: string), _col44 (type: string), _col22 (type: string), _col24 (type: string), _col30 (type: double), _col32 (type: double), _col31 (type: double), _col67 (type: double), _col68 (type: string), _col69 (type: double), _col111 (type: string), _col34 (type: double), _col83 (type: bigint), CASE WHEN ((_col78 = 1.0)) THEN (1) ELSE (0) END (type: int), CASE WHEN (_col110 is not null) THEN (1) ELSE (0) END (type: int), CASE WHEN ((_col1 like ‘%@sina%‘)) THEN (1) WHEN ((_col1 like ‘%@pingan%‘)) THEN (2) WHEN (((_col1 like ‘%@alipay%‘) and (not ((_col117) IN (1, 2) and (_col118 = 1.0))))) THEN (3) WHEN ((((_col1 like ‘%@alipay%‘) and (_col117) IN (1, 2)) and (_col118 = 1.0))) THEN (4) WHEN ((_col1 like ‘%@163%‘)) THEN (5) WHEN ((_col1 like ‘%@kaixin001%‘)) THEN (6) WHEN ((_col1 like ‘%@139%‘)) THEN (7) WHEN ((_col1 like ‘%@msn.com%‘)) THEN (8) WHEN ((_col1 like ‘%@anyue%‘)) THEN (9) WHEN ((_col1 like ‘%@qq%‘)) THEN (10) ELSE (0) END (type: int), CASE WHEN (((_col1 like ‘%@b2b%‘) or (_col1 = ‘yuxiaolan‘))) THEN (1) ELSE (0) END (type: int), CASE WHEN (_col145 is not null) THEN (1) ELSE (0) END (type: int), CASE WHEN (_col146 is not null) THEN (1) ELSE (0) END (type: int), CASE WHEN ((_col125 = ‘1‘)) THEN (1) ELSE (0) END (type: int), _col126 (type: string), _col157 (type: double), ‘‘ (type: string), ‘‘ (type: string), ‘‘ (type: string), ‘‘ (type: string), _col103 (type: double), CASE WHEN (_col72 is null) THEN (null) WHEN ((_col72 is not null and _col163 is null)) THEN (-999999) ELSE (_col163) END (type: bigint), _col72 (type: string), _col133 (type: int)

                      outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col30, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38

                      Statistics: Num rows: 636351838 Data size: 291067606274 Basic stats: COMPLETE Column stats: NONE

                      File Output Operator

                        compressed: false

                        Statistics: Num rows: 636351838 Data size: 291067606274 Basic stats: COMPLETE Column stats: NONE

                        table:

                            input format: org.apache.hadoop.mapred.TextInputFormat

                            output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

                            serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

        Reducer 6 

            Reduce Operator Tree:

              Group By Operator

                keys: KEY._col0 (type: string)

                mode: mergepartial

                outputColumnNames: _col0

                Statistics: Num rows: 2568953 Data size: 256895341 Basic stats: COMPLETE Column stats: NONE

                Reduce Output Operator

                  key expressions: _col0 (type: string)

                  sort order: +

                  Map-reduce partition columns: _col0 (type: string)

                  Statistics: Num rows: 2568953 Data size: 256895341 Basic stats: COMPLETE Column stats: NONE

        Reducer 8 

            Reduce Operator Tree:

              Merge Join Operator

                condition map:

                     Left Outer Join0 to 1

                     Left Outer Join0 to 2

                     Left Outer Join0 to 3

                keys:

                  0 id (type: bigint)

                  1 end_user_id (type: bigint)

                  2 end_usr_id (type: bigint)

                  3 end_user_id (type: bigint)

                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col22, _col24, _col30, _col31, _col32, _col34, _col44, _col67, _col68, _col69, _col72, _col78, _col83, _col84, _col103, _col107, _col108, _col109, _col110, _col111, _col117, _col118, _col125, _col126, _col131, _col133

                Statistics: Num rows: 358575394 Data size: 164012540172 Basic stats: COMPLETE Column stats: NONE

                Reduce Output Operator

                  key expressions: lower(_col107) (type: string)

                  sort order: +

                  Map-reduce partition columns: lower(_col107) (type: string)

                  Statistics: Num rows: 358575394 Data size: 164012540172 Basic stats: COMPLETE Column stats: NONE

                  value expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col22 (type: string), _col24 (type: string), _col30 (type: double), _col31 (type: double), _col32 (type: double), _col34 (type: double), _col44 (type: string), _col67 (type: double), _col68 (type: string), _col69 (type: double), _col72 (type: string), _col78 (type: double), _col83 (type: bigint), _col84 (type: string), _col103 (type: double), _col107 (type: string), _col108 (type: string), _col109 (type: string), _col110 (type: int), _col111 (type: string), _col117 (type: double), _col118 (type: double), _col125 (type: string), _col126 (type: string), _col131 (type: bigint), _col133 (type: int)


  Stage: Stage-0

    Fetch Operator

      limit: -1

      Processor Tree:

        ListSink

Tez 合并了多个join给一个reduce,避免了启动过多的job(见粗体部分)已经有所优化,但是为啥此sql执行慢呢?

-------------------------------------------------------------------------------- 
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED 
-------------------------------------------------------------------------------- 
Map 1 .......... SUCCEEDED 1 1 0 0 0 0 
Map 10 ......... SUCCEEDED 3 3 0 0 0 0 
Map 11 ......... SUCCEEDED 151 151 0 0 0 0 
Map 12 ......... SUCCEEDED 1 1 0 0 0 0 
Map 13 ......... SUCCEEDED 76 76 0 0 0 0 
Map 5 .......... SUCCEEDED 11 11 0 0 0 0 
Map 7 .......... SUCCEEDED 156 156 0 0 0 0 
Map 9 .......... SUCCEEDED 10 10 0 0 0 0 
Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0 
Reducer 3 ..... RUNNING 642 641 1 0 0 0 
Reducer 4 RUNNING 1009 0 89 920 0 0 
Reducer 6 ...... SUCCEEDED 3 3 0 0 0 0 
Reducer 8 ...... SUCCEEDED 203 203 0 0 0 0 
-------------------------------------------------------------------------------- 
VERTICES: 11/13 [==============>>------------] 55% ELAPSED TIME: 307.54 s   

Reduce 3 中有1个任务始终RUNNING

查看SQL中最后的where 条件发现

where t.update_time>= ‘2015-05-04‘   or t8.end_user_id>0;  导致无法提前过滤掉t.update_time< ‘2015-05-04‘的数据,导致后续join产生的reduce数据过大(见上面执行计划蓝色部分),可能产生了数据倾斜,导致任务时长拉长。

 

去掉or t8.end_user_id>0的时候,end_user   TS返回36219736

 

Map 7

            MapOperator Tree:

               TableScan

                 alias: t

                 Statistics: Num rows: 108659208 Data size: 49700768672 Basic stats: COMPLETEColumn stats: NONE

                 Filter Operator

                   predicate: (update_time >=‘2015-05-04‘) (type: boolean)

                   Statistics: Num rows: 36219736 Data size: 16566922890 Basic stats: COMPLETEColumn stats: NONE

                   Reduce Output Operator

                     key expressions: id (type: bigint)

                     sort order: +

                     Map-reduce partition columns: id (type: bigint)

                     Statistics: Num rows: 36219736 Data size: 16566922890 Basic stats: COMPLETEColumn stats: NONE

                     value expressions: end_user_name (type: string),

没有去掉or t8.end_user_id>0的时候,end_user   TS返回108659208

  Map 7

            MapOperator Tree:

               TableScan

                 alias: t

                 Statistics: Num rows: 108659208 Data size: 49700768672 Basic stats: COMPLETEColumn stats: NONE

                 Reduce Output Operator

                   key expressions: id (type: bigint)

                   sort order: +

                   Map-reduce partition columns: id (type: bigint)

                   Statistics: Num rows: 108659208 Data size: 49700768672 Basic stats: COMPLETEColumn stats: NONE

                   value expressions: end_user_name (type: string),end_user_password (type: string), end_user_real_name (type: string),end_user_birthday (type: string), end_user_last_login_date (type: string),end_user_last_bought_date (type: string), end_user_login_times (type: double),end_user_bought_amount (type: double), end_user_bought_times (type: double),end_user_sex (type: double), end_user_create_time (type: string), end_user_type(type: double), ip (type: string), end_user_points (type: double), co_code(type: string), is_email_activate (type: double), mc_site_id (type: bigint),update_time (type: string), member_grade (type: double), end_user_email (type:string), mobile (type: string), phone (type: string), valid_mobile_phone_num(type: int), id_card (type: string)


业务部门说可以去掉or t8.end_user_id>0 这样就大大减少了 reduce数据的结果集。

最后附上去掉or t8.end_user_id>0的执行计划:

STAGE DEPENDENCIES:

  Stage-1 is a root stage

  Stage-0 depends on stages: Stage-1


STAGE PLANS:

  Stage: Stage-1

    Tez

      Edges:

        Reducer 2 <- Map 1 (SIMPLE_EDGE)

        Reducer 3 <- Reducer 2 (SIMPLE_EDGE), Reducer 6 (SIMPLE_EDGE), Reducer 8 (SIMPLE_EDGE)

        Reducer 4 <- Map 11 (SIMPLE_EDGE), Map 12 (BROADCAST_EDGE), Reducer 3 (SIMPLE_EDGE)

        Reducer 6 <- Map 5 (SIMPLE_EDGE)

        Reducer 8 <- Map 10 (SIMPLE_EDGE), Map 13 (SIMPLE_EDGE), Map 7 (SIMPLE_EDGE), Map 9 (SIMPLE_EDGE)

      DagName: zhoushugang_20150505143030_fda7333f-2147-4a65-a63f-7c08c8a3630b:1

      Vertices:

        Map 1 

            Map Operator Tree:

                TableScan

                  alias: yhd_send_message_blacklist

                  Statistics: Num rows: 870860 Data size: 46859629 Basic stats: COMPLETE Column stats: NONE

                  Select Operator

                    expressions: send_to (type: string)

                    outputColumnNames: send_to

                    Statistics: Num rows: 870860 Data size: 46859629 Basic stats: COMPLETE Column stats: NONE

                    Group By Operator

                      keys: lower(send_to) (type: string)

                      mode: hash

                      outputColumnNames: _col0

                      Statistics: Num rows: 870860 Data size: 46859629 Basic stats: COMPLETE Column stats: NONE

                      Reduce Output Operator

                        key expressions: _col0 (type: string)

                        sort order: +

                        Map-reduce partition columns: _col0 (type: string)

                        Statistics: Num rows: 870860 Data size: 46859629 Basic stats: COMPLETE Column stats: NONE

        Map 10 

            Map Operator Tree:

                TableScan

                  alias: t5

                  Statistics: Num rows: 4414357 Data size: 67807378 Basic stats: COMPLETE Column stats: NONE

                  Reduce Output Operator

                    key expressions: end_usr_id (type: bigint)

                    sort order: +

                    Map-reduce partition columns: end_usr_id (type: bigint)

                    Statistics: Num rows: 4414357 Data size: 67807378 Basic stats: COMPLETE Column stats: NONE

                    value expressions: car (type: string), shopping_habit (type: string)

        Map 11 

            Map Operator Tree:

                TableScan

                  alias: t6

                  Statistics: Num rows: 38930100 Data size: 4515891712 Basic stats: COMPLETE Column stats: NONE

                  Filter Operator

                    predicate: user_id is not null (type: boolean)

                    Statistics: Num rows: 19465050 Data size: 2257945856 Basic stats: COMPLETE Column stats: NONE

                    Reduce Output Operator

                      key expressions: email (type: string)

                      sort order: +

                      Map-reduce partition columns: email (type: string)

                      Statistics: Num rows: 19465050 Data size: 2257945856 Basic stats: COMPLETE Column stats: NONE

                      value expressions: mail_level (type: double)

        Map 12 

            Map Operator Tree:

                TableScan

                  alias: t7

                  Statistics: Num rows: 16 Data size: 1752 Basic stats: COMPLETE Column stats: NONE

                  Filter Operator

                    predicate: usr_union_logon_code is not null (type: boolean)

                    Statistics: Num rows: 8 Data size: 876 Basic stats: COMPLETE Column stats: NONE

                    Reduce Output Operator

                      key expressions: usr_union_logon_code (type: string)

                      sort order: +

                      Map-reduce partition columns: usr_union_logon_code (type: string)

                      Statistics: Num rows: 8 Data size: 876 Basic stats: COMPLETE Column stats: NONE

                      value expressions: usr_union_logon_id (type: bigint)

        Map 13 

            Map Operator Tree:

                TableScan

                  alias: t8

                  Statistics: Num rows: 25660729 Data size: 3178607508 Basic stats: COMPLETE Column stats: NONE

                  Filter Operator

                    predicate: (to_date(update_time) = ‘2015-05-04‘) (type: boolean)

                    Statistics: Num rows: 12830364 Data size: 1589303692 Basic stats: COMPLETE Column stats: NONE

                    Reduce Output Operator

                      key expressions: end_user_id (type: bigint)

                      sort order: +

                      Map-reduce partition columns: end_user_id (type: bigint)

                      Statistics: Num rows: 12830364 Data size: 1589303692 Basic stats: COMPLETE Column stats: NONE

                      value expressions: exp (type: int)

        Map 5 

            Map Operator Tree:

                TableScan

                  alias: edm_error_email

                  Statistics: Num rows: 5137907 Data size: 513790784 Basic stats: COMPLETE Column stats: NONE

                  Select Operator

                    expressions: email (type: string)

                    outputColumnNames: email

                    Statistics: Num rows: 5137907 Data size: 513790784 Basic stats: COMPLETE Column stats: NONE

                    Group By Operator

                      keys: lower(email) (type: string)

                      mode: hash

                      outputColumnNames: _col0

                      Statistics: Num rows: 5137907 Data size: 513790784 Basic stats: COMPLETE Column stats: NONE

                      Reduce Output Operator

                        key expressions: _col0 (type: string)

                        sort order: +

                        Map-reduce partition columns: _col0 (type: string)

                        Statistics: Num rows: 5137907 Data size: 513790784 Basic stats: COMPLETE Column stats: NONE

        Map 7 

            Map Operator Tree:

                TableScan

                  alias: t

                  Statistics: Num rows: 108659208 Data size: 49700768672 Basic stats: COMPLETE Column stats: NONE

                  Filter Operator

                    predicate: (update_time >= ‘2015-05-04‘) (type: boolean)

                    Statistics: Num rows: 36219736 Data size: 16566922890 Basic stats: COMPLETE Column stats: NONE

                    Reduce Output Operator

                      key expressions: id (type: bigint)

                      sort order: +

                      Map-reduce partition columns: id (type: bigint)

                      Statistics: Num rows: 36219736 Data size: 16566922890 Basic stats: COMPLETE Column stats: NONE

                      value expressions: end_user_name (type: string), end_user_password (type: string), end_user_real_name (type: string), end_user_birthday (type: string), end_user_last_login_date (type: string), end_user_last_bought_date (type: string), end_user_login_times (type: double), end_user_bought_amount (type: double), end_user_bought_times (type: double), end_user_sex (type: double), end_user_create_time (type: string), end_user_type (type: double), ip (type: string), end_user_points (type: double), co_code (type: string), is_email_activate (type: double), mc_site_id (type: bigint), member_grade (type: double), end_user_email (type: string), mobile (type: string), phone (type: string), valid_mobile_phone_num (type: int), id_card (type: string)

        Map 9 

            Map Operator Tree:

                TableScan

                  alias: t2

                  Statistics: Num rows: 11871989 Data size: 422420127 Basic stats: COMPLETE Column stats: NONE

                  Reduce Output Operator

                    key expressions: end_user_id (type: bigint)

                    sort order: +

                    Map-reduce partition columns: end_user_id (type: bigint)

                    Statistics: Num rows: 11871989 Data size: 422420127 Basic stats: COMPLETE Column stats: NONE

                    value expressions: user_grade (type: double), user_grade_type (type: double)

        Reducer 2 

            Reduce Operator Tree:

              Group By Operator

                keys: KEY._col0 (type: string)

                mode: mergepartial

                outputColumnNames: _col0

                Statistics: Num rows: 435430 Data size: 23429814 Basic stats: COMPLETE Column stats: NONE

                Reduce Output Operator

                  key expressions: _col0 (type: string)

                  sort order: +

                  Map-reduce partition columns: _col0 (type: string)

                  Statistics: Num rows: 435430 Data size: 23429814 Basic stats: COMPLETE Column stats: NONE

        Reducer 3 

            Reduce Operator Tree:

              Merge Join Operator

                condition map:

                     Left Outer Join0 to 1

                     Left Outer Join0 to 2

                keys:

                  0 lower(_col107) (type: string)

                  1 _col0 (type: string)

                  2 _col0 (type: string)

                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col22, _col24, _col30, _col31, _col32, _col34, _col44, _col67, _col68, _col69, _col72, _col78, _col83, _col103, _col107, _col108, _col109, _col110, _col111, _col117, _col118, _col125, _col126, _col133, _col145, _col146

                Statistics: Num rows: 262955293 Data size: 120275865393 Basic stats: COMPLETE Column stats: NONE

                Reduce Output Operator

                  key expressions: _col107 (type: string)

                  sort order: +

                  Map-reduce partition columns: _col107 (type: string)

                  Statistics: Num rows: 262955293 Data size: 120275865393 Basic stats: COMPLETE Column stats: NONE

                  value expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col22 (type: string), _col24 (type: string), _col30 (type: double), _col31 (type: double), _col32 (type: double), _col34 (type: double), _col44 (type: string), _col67 (type: double), _col68 (type: string), _col69 (type: double), _col72 (type: string), _col78 (type: double), _col83 (type: bigint), _col103 (type: double), _col108 (type: string), _col109 (type: string), _col110 (type: int), _col111 (type: string), _col117 (type: double), _col118 (type: double), _col125 (type: string), _col126 (type: string), _col133 (type: int), _col145 (type: string), _col146 (type: string)

        Reducer 4 

            Reduce Operator Tree:

              Merge Join Operator

                condition map:

                     Left Outer Join0 to 1

                keys:

                  0 _col107 (type: string)

                  1 email (type: string)

                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col22, _col24, _col30, _col31, _col32, _col34, _col44, _col67, _col68, _col69, _col72, _col78, _col83, _col103, _col107, _col108, _col109, _col110, _col111, _col117, _col118, _col125, _col126, _col133, _col145, _col146, _col157

                Statistics: Num rows: 289250828 Data size: 132303454799 Basic stats: COMPLETE Column stats: NONE

                Map Join Operator

                  condition map:

                       Left Outer Join0 to 1

                  keys:

                    0 _col72 (type: string)

                    1 usr_union_logon_code (type: string)

                  outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col22, _col24, _col30, _col31, _col32, _col34, _col44, _col67, _col68, _col69, _col72, _col78, _col83, _col103, _col107, _col108, _col109, _col110, _col111, _col117, _col118, _col125, _col126, _col133, _col145, _col146, _col157, _col163

                  input vertices:

                    1 Map 12

                  Statistics: Num rows: 318175917 Data size: 145533803433 Basic stats: COMPLETE Column stats: NONE

                  Select Operator

                    expressions: _col0 (type: bigint), _col1 (type: string), _col3 (type: string), _col2 (type: string), _col4 (type: string), _col108 (type: string), _col109 (type: string), _col110 (type: int), _col107 (type: string), substr(_col107, instr(_col107‘@‘), (instr(_col107‘.‘) - instr(_col107‘@‘))) (type: string), _col44 (type: string), _col22 (type: string), _col24 (type: string), _col30 (type: double), _col32 (type: double), _col31 (type: double), _col67 (type: double), _col68 (type: string), _col69 (type: double), _col111 (type: string), _col34 (type: double), _col83 (type: bigint), CASE WHEN ((_col78 = 1.0)) THEN (1) ELSE (0) END (type: int), CASE WHEN (_col110 is not null) THEN (1) ELSE (0) END (type: int), CASE WHEN ((_col1 like ‘%@sina%‘)) THEN (1) WHEN ((_col1 like ‘%@pingan%‘)) THEN (2) WHEN (((_col1 like ‘%@alipay%‘) and (not ((_col117) IN (1, 2) and (_col118 = 1.0))))) THEN (3) WHEN ((((_col1 like ‘%@alipay%‘) and (_col117) IN (1, 2)) and (_col118 = 1.0))) THEN (4) WHEN ((_col1 like ‘%@163%‘)) THEN (5) WHEN ((_col1 like ‘%@kaixin001%‘)) THEN (6) WHEN ((_col1 like ‘%@139%‘)) THEN (7) WHEN ((_col1 like ‘%@msn.com%‘)) THEN (8) WHEN ((_col1 like ‘%@anyue%‘)) THEN (9) WHEN ((_col1 like ‘%@qq%‘)) THEN (10) ELSE (0) END (type: int), CASE WHEN (((_col1 like ‘%@b2b%‘) or (_col1 = ‘yuxiaolan‘))) THEN (1) ELSE (0) END (type: int), CASE WHEN (_col145 is not null) THEN (1) ELSE (0) END (type: int), CASE WHEN (_col146 is not null) THEN (1) ELSE (0) END (type: int), CASE WHEN ((_col125 = ‘1‘)) THEN (1) ELSE (0) END (type: int), _col126 (type: string), _col157 (type: double), ‘‘ (type: string), ‘‘ (type: string), ‘‘ (type: string), ‘‘ (type: string), _col103 (type: double), CASE WHEN (_col72 is null) THEN (null) WHEN ((_col72 is not null and _col163 is null)) THEN (-999999) ELSE (_col163) END (type: bigint), _col72 (type: string), _col133 (type: int)

                    outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col30, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38

                    Statistics: Num rows: 318175917 Data size: 145533803433 Basic stats: COMPLETE Column stats: NONE

                    File Output Operator

                      compressed: false

                      Statistics: Num rows: 318175917 Data size: 145533803433 Basic stats: COMPLETE Column stats: NONE

                      table:

                          input format: org.apache.hadoop.mapred.TextInputFormat

                          output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

                          serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

        Reducer 6 

            Reduce Operator Tree:

              Group By Operator

                keys: KEY._col0 (type: string)

                mode: mergepartial

                outputColumnNames: _col0

                Statistics: Num rows: 2568953 Data size: 256895341 Basic stats: COMPLETE Column stats: NONE

                Reduce Output Operator

                  key expressions: _col0 (type: string)

                  sort order: +

                  Map-reduce partition columns: _col0 (type: string)

                  Statistics: Num rows: 2568953 Data size: 256895341 Basic stats: COMPLETE Column stats: NONE

        Reducer 8 

            Reduce Operator Tree:

              Merge Join Operator

                condition map:

                     Left Outer Join0 to 1

                     Left Outer Join0 to 2

                     Left Outer Join0 to 3

                keys:

                  0 id (type: bigint)

                  1 end_user_id (type: bigint)

                  2 end_usr_id (type: bigint)

                  3 end_user_id (type: bigint)

                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col22, _col24, _col30, _col31, _col32, _col34, _col44, _col67, _col68, _col69, _col72, _col78, _col83, _col103, _col107, _col108, _col109, _col110, _col111, _col117, _col118, _col125, _col126, _col133

                Statistics: Num rows: 119525131 Data size: 54670846721 Basic stats: COMPLETE Column stats: NONE

                Reduce Output Operator

                  key expressions: lower(_col107) (type: string)

                  sort order: +

                  Map-reduce partition columns: lower(_col107) (type: string)

                  Statistics: Num rows: 119525131 Data size: 54670846721 Basic stats: COMPLETE Column stats: NONE

                  value expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col22 (type: string), _col24 (type: string), _col30 (type: double), _col31 (type: double), _col32 (type: double), _col34 (type: double), _col44 (type: string), _col67 (type: double), _col68 (type: string), _col69 (type: double), _col72 (type: string), _col78 (type: double), _col83 (type: bigint), _col103 (type: double), _col107 (type: string), _col108 (type: string), _col109 (type: string), _col110 (type: int), _col111 (type: string), _col117 (type: double), _col118 (type: double), _col125 (type: string), _col126 (type: string), _col133 (type: int)


  Stage: Stage-0

    Fetch Operator

      limit: -1

      Processor Tree:

        ListSink


hive on tez sql 优化

标签:hive tez sql 优化

原文地址:http://tangjj.blog.51cto.com/1848040/1642541

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!