标签:order limited price parse 结果 json 显示 ted HERE
函数 1.hive函数分类 标准函数 自定义标准函数称之为 UDF 传递一行数据,返回一个结果 聚合函数 自定义聚合函数称之为 UDAF 传递多行数据,返回一个结果 group by sum count 表生成函数 自定义表生成函数称之为 UDTF 传递一行数据,返回多行数据 explode 2.函数帮助文档 SHOW FUNCTIONS; 显示当前环境所有的函数 DESC FUNCTION length; 显示帮助函数文档 DESC FUNCTION EXTENDED length; 显示函数额外信息 3.数学函数 round 求四舍五入 rand 随机 conv 进制转换函数 conv(num, from_base, to_base) num数据 from_base 指定的进制 to_base 转换进制 cast 类型转换 select cast(id as STRING) from student; 把int类型转化成STRING 4.日期函数 from_unixtime 把时间戳转化成字符串 select from_unixtime(1567995900, ‘yyyyMMdd HHmmss‘) unix_timestamp 把字符串转化时间戳 select unix_timestamp(‘2011-12-10 13:10:12‘); 5.条件函数 IF(expr1,expr2,expr3) !!! expr1 如果expr1为真 输出expr2 否则就输出expr3 SELECT name, age, if(age > 20, ‘A‘, ‘B‘) as type FROM student; 返回第一个非空数据 # SELECT coalesce(null, 12, 3,14,null); CASE !!! 给数据分组 SELECT id, CASE WHEN id <=2 THEN ‘a‘ WHEN id > 2 AND id <=10 THEN ‘b‘ ELSE ‘c‘ END FROM student; SELECT mycol, sum(id)FROM (SELECT id, (CASE WHEN id <=2 THEN ‘a‘ WHEN id > 2 AND id <=10 THEN ‘b‘ ELSE ‘c‘ END) as mycol FROM student) t GROUP BY mycol SELECT CASE id WHEN 2 THEN ‘a‘ ELSE ‘b‘ END from student; ////////////////////////////////////////////////////////////////////// 6.字符串处理函数 字符串拼接 concat 拼接任何类型,中间没有分隔符指定 select id, concat(name, age) from student; concat_ws 只能拼接字符串类型,可以指定分隔符 select concat_ws(‘_‘, ‘asdfasd‘, ‘18‘) ; instr(string str, string substr) select instr(‘candle‘, ‘dle‘) length 字符串大小 lower 小写 lcase 大写 正则表达式 # regexp_extract 通过正则表达式查找 regexp_extract(str, regexp[, idx]) 参数 str 需要处理的字符串 regexp 正则表达式 主要用于匹配 idx 索引 0 整个字符串匹配 1 匹配正则表达式中第一个括号 2 匹配正则表达式第二个括号 select regexp_extract(‘x=asdfasdf12345asdf‘, ‘x=([a-z]+)([0-9]+)([a-z]+)‘, 3); regexp_replace 。。。。。。。替换 select regexp_replace(‘x=asdfasdf12345asdf3456345‘, ‘([0-9]+)‘,‘xxxx‘ ); URL # parse_url 专门用来解析URL数据 http://www.candle.com:50070/dir1/dir2/dir3/file.html?key1=value1&key2=value2#imhere select parse_url(‘http://www.candle.com:50070/dir1/dir2/dir3/file.html?key1=value1&key2=value2#imhere‘, ‘REF‘); [HOST 域名,PATH 路径,QUERY 查询,REF 锚点,PROTOCOL 协议,FILE,AUTHORITY IP地址+端口,USERINFO] 输出结果 hive (hadoop)> select parse_url(‘http://www.candle.com:50070/dir1/dir2/dir3/file.html?key1=value1&key2=value2#imhere‘, ‘HOST‘); OK _c0 www.candle.com Time taken: 0.07 seconds, Fetched: 1 row(s) hive (hadoop)> select parse_url(‘http://www.candle.com:50070/dir1/dir2/dir3/file.html?key1=value1&key2=value2#imhere‘, ‘PATH‘); OK _c0 /dir1/dir2/dir3/file.html Time taken: 0.093 seconds, Fetched: 1 row(s) hive (hadoop)> select parse_url(‘http://www.candle.com:50070/dir1/dir2/dir3/file.html?key1=value1&key2=value2#imhere‘, ‘QUERY‘); OK _c0 key1=value1&key2=value2 Time taken: 0.051 seconds, Fetched: 1 row(s) hive (hadoop)> select parse_url(‘http://www.candle.com:50070/dir1/dir2/dir3/file.html?key1=value1&key2=value2#imhere‘, ‘QUERY‘, ‘key1‘); OK _c0 value1 Time taken: 0.105 seconds, Fetched: 1 row(s) hive (hadoop)> select parse_url(‘http://www.candle.com:50070/dir1/dir2/dir3/file.html?key1=value1&key2=value2#imhere‘, ‘REF‘); OK _c0 imhere JSON # ‘{"store":{"fruit":\[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],"bicycle":{"price":19.95,"color":"red"} }, "email":"amy@only_for_json_udf_test.net","owner":"amy"}‘ { "store": { "fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}], "bicycle":{"price":19.95,"color":"red"} } , "email":"amy@only_for_json_udf_test.net", "owner":"amy" } hive (hadoop)> select get_json_object(‘{"store":{"fruit":\[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],"bicycle":{"price":19.95,"color":"red"} }, "email":"amy@only_for_json_udf_test.net","owner":"amy"}‘, ‘$.store.bicycle.price‘); OK _c0 19.95 hive (hadoop)> select get_json_object(‘{"store":{"fruit":\[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],"bicycle":{"price":19.95,"color":"red"} }, "email":"amy@only_for_json_udf_test.net","owner":"amy"}‘, ‘$.store.fruit[0].type‘); OK _c0 apple 7.宽表和长表转化 !!! explode 表生成函数 hive (hadoop)> select explode(city) from student1; 错误: hive (hadoop)> select id, name, explode(city) from student1; FAILED: SemanticException [Error 10081]: UDTF‘s are not supported outside the SELECT clause, nor nested in expressions 数组 hive 中 查询,表生成函数不能直接和普通列直接查询 需要先生成一个临时视图 拼接起来 select id, name, mycity from student1 lateral view explode(city) tmp_view AS mycity; SELECT id, name, mycity FROM student1 LATERAL VIEW explode(city) 临时视图名称 AS 自定义列名 键值对 键值对 explode生成两列 key value > select id, name , mykey, myvalue from student3 LATERAL VIEW explode(pairs) tmp_view AS mykey, myvalue; OK id name mykey myvalue 1 candle k1 v1 1 candle k2 v2 2 jack k1 v1 2 jack k2 v2 3 tom k1 v1 结构体: 多列 hive (hadoop)> select explode(addr) from student2; FAILED: UDFArgumentException explode() takes an array or a map as a parameter 注意: 结构体成员不能直接转化成多列,explode只接受arry和map 其实,把结构体每个成员转化成每一行 没有意义 一般再处理结构体采用如下形式 hive (hadoop)> select id,name, addr.city, addr.area, addr.streetid from student2; OK id name city area streetid 1 candle shanghai minhang 35 2 jack beijing haidian 100 3 tom hefei shushan 103 一般不采用如下格式 hive (hadoop)> select id, name, info from student2 lateral view explode(array(addr.city, addr.area, addr.streetid)) tmp_view as info; OK id name info 1 candle shanghai 1 candle minhang 1 candle 35 2 jack beijing 2 jack haidian 2 jack 100 3 tom hefei 3 tom shushan 3 tom 103 长转宽 聚合函数 group by collect_list 多行数据 合并到一行 生成数组 list 允许成员重复 select age, collect_list(name), collect_list(id) from student group by age; age _c1 _c2 18 ["candle","candle"] [1,1] 19 ["aa","jack"] [10,2] 20 ["c2","tom"] [1,4] 100 ["cc"] [12] 200 ["dd"] [13] collect_set 不允许重复 select age, collect_set(name), collect_set(id) from student group by age; age _c1 _c2 18 ["candle"] [1] 19 ["aa","jack"] [10,2] 20 ["c2","tom"] [1,4] 100 ["cc"] [12] 200 ["dd"] [13] select map(id, name) from student; 生成键值对 select struct(id, name, age) from student; 生成结构体 8.窗口函数 !!! 分组排序 1) 创建表格 CREATE TABLE windows ( id INT, class STRING, score INT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,‘ LINES TERMINATED BY ‘\n‘ STORED AS TEXTFILE; 2) row_number partition by 指定分组列 order by 指定排序列 SELECT class, score, row_number() over(partition by class order by score desc) from windows; 排名 不会因为score相同改变次序 class score paiming class1 78 1 class1 78 2 class1 66 3 class1 66 4 class1 66 5 class1 56 6 class1 56 7 class1 56 8 class1 56 9 class1 56 10 3) rank 排名 如果score重复,排序也重复,下一个名词会跳过重复的部分 SELECT class, score, rank() over(partition by class order by score desc) from windows; class score _wcol0 class1 78 1 class1 78 1 class1 66 3 class1 66 3 class1 66 3 class1 56 6 class1 56 6 class1 56 6 class1 56 6 class1 56 6 4) dense_rank 如果score重复,排序也重复,下一个名词不会跳过重复的部分 SELECT class, score, dense_rank() over(partition by class order by score desc) from windows; class score _wcol0 class1 78 1 class1 78 1 class1 66 2 class1 66 2 class1 66 2 class1 56 3 class1 56 3 class1 56 3 class1 56 3 class1 56 3 5) 分组 topn模型 不能直接筛选 hive (hadoop)> SELECT class, score, row_number() over(partition by class order by score desc) as paiming from windows WHERE paiming <= 5; FAILED: SemanticException [Error 10004]: Line 1:110 Invalid table alias or column reference ‘paiming‘: (possible column names are: id, class, score) 使用子查询 SELECT class, score from (SELECT class, score, row_number() over(partition by class order by score desc) as paiming from windows) t WHERE paiming <=5;
标签:order limited price parse 结果 json 显示 ted HERE
原文地址:https://www.cnblogs.com/alpha-cat/p/13184139.html