标签:des style io color ar os 使用 java for
#添加一个数据库用户
insert into mysql.user(Host,User,Password) values("%","longsheng",password("longsheng1234"));
#数据库授权:
GRANT all ON hive.* TO longsheng@‘longV007‘ IDENTIFIED BY ‘sheng‘;
flush privileges;
#设置数据库日志格式
set global binlog_format=‘MIXED‘;
#集群方式启动Hive
hive --auxpath /usr/share/hive/lib/hive-hbase-handler-0.13.1.jar,/usr/share/hive/lib/zookeeper-3.4.6.jar
(可选参数 -hiveconf hive.root.logger=DEBUG,console)
#创建Hive和HBase的关联表:
* bidask_quote_hive是hive的表
* "hbase.table.name" = "bidask_quote"是hbase中已经存在的表
接下来将两者关联起来,命令如下:
#创建关联表命令
hive> CREATE EXTERNAL TABLE bidask_quote_hive(key string,ProdCode string,ProdName string,TradingDay string,ExchangeID string,ExchangeInstID string,LastPrice string,PreSettlementPrice string,PreClosePrice string,PreOpenInterest string,OpenPrice string,HighestPrice string,LowestPrice string,Volume string,Turnover string,TotalVolume string,TotalTurnover string,OpenInterest string,ClosePrice string,SettlementPrice string,UpperLimitPrice string,LowerLimitPrice string,PreDelta string,CurrDelta string,UpdateTime string,UpdateMillisec string,BidPrice1 string,BidVolume1 string,AskPrice1 string,AskVolume1 string,BidPrice2 string,BidVolume2 string,AskPrice2 string,AskVolume2 string,BidPrice3 string,BidVolume3 string,AskPrice3 string,AskVolume3 string,BidPrice4 string,BidVolume4 string,AskPrice4 string,AskVolume4 string,BidPrice5 string,BidVolume5 string,AskPrice5 string,AskVolume5 string,AveragePrice string,Equil string,CloseDate string,DecInPrice string,logDateTime string)
STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler‘
WITH SERDEPROPERTIES ("hbase.columns.mapping" = "info:ProdCode,info:ProdName,info:TradingDay,info:ExchangeID,info:ExchangeInstID,info:LastPrice,info:PreSettlementPrice,info:PreClosePrice,info:PreOpenInterest,info:OpenPrice,info:HighestPrice,info:LowestPrice,info:Volume,info:Turnover,info:TotalVolume,info:TotalTurnover,info:OpenInterest,info:ClosePrice,info:SettlementPrice,info:UpperLimitPrice,info:LowerLimitPrice,info:PreDelta,info:CurrDelta,info:UpdateTime,info:UpdateMillisec,info:BidPrice1,info:BidVolume1,info:AskPrice1,info:AskVolume1,info:BidPrice2,info:BidVolume2,info:AskPrice2,info:AskVolume2,info:BidPrice3,info:BidVolume3,info:AskPrice3,info:AskVolume3,info:BidPrice4,info:BidVolume4,info:AskPrice4,info:AskVolume4,info:BidPrice5,info:BidVolume5,info:AskPrice5,info:AskVolume5,info:AveragePrice,info:Equil,info:CloseDate,info:DecInPrice,info:logDateTime")
TBLPROPERTIES("hbase.table.name" = "bidask_quote");
我的表的字段很多,50多个。
说明:
1.bidask_quote_hive(key string,ProdCode string,string,ProdName string,....)是hive表的结构
2."hbase.columns.mapping" = "info:ProdCode,info:ProdName,info:TradingDay,...)是HBase中的列信息,这里现在只有一个列蔟。
hive> show tables;
hive> select * from bidask_quote_hive limit 3;
hive> select count(1) from bidask_quote_hive;
hive> select key,ProdCode,TradingDay from bidask_quote_hive limit 3;
别名查询对于单表查询没有多少意义,如果使用连接查询的话可以使用别名
hive> select b.ProdCode,b.TradingDay from bidask_quote_hive b limit 3;
hive> select symbol,‘price.*‘ from stocks;
(说明:该条语句查询symbol列和以price为前缀的所有列)
hive> select upper(name),deductions["Federal Taxes"],round(salary * (1 - deductions["Federal Taxes"])) from employees;
(说明:1:upper(name)函数对name字段的值全部转为大写
2:deductios是个Map类型的字段,我们只取出Federal Taxes的值
3:round()函数是做了一个四舍五入的运算)
hive> select * from bidask_quote_hive limit 3;
hive> select upper(name),deductions["Federal Taxes"] as fed_taxs from employees;
hive> from(
> select upper(name),deductions["Federal Taxes"] as fed_taxs,
> round(salary * (1 - deductions["Federal Taxes"])) as salary_minus_fed_taxes from employees;
> ) e
> select e.name, e.salary_minus_fed_taxes
> where e.salary_minus_fed_taxes > 7000;
hive> select * from employees where country = ‘US‘ and state = ‘CA‘
示例1:
hive> select name,address.street from employees where address.street like ‘%Avo.‘;
John 1 Michigan Ave.
Todd 200 Chicago Ave.
示例2:
hive> select name,address.street from employees where address.street like ‘%Chi%‘;
Todd 200 Chicago Ave.
Relike相当由于正则表达式,可以使用Java的正则表达式.例:
hive>select name, address.street from employees where address.street relike ‘.*(Chicago|Ontario).*‘
Mary 100 Ontario St.
todd 200 Chicago Ave.
说明:上面的正则表达式中点号(.)标识和任意字符匹配,星号(*)表示重复“左边的字符串”零次或无数次。表达式(x|y)表示x或y。
说明:group by 通常和聚合函数一起使用,按照一个或多个列进行分组,然后对每个组执行聚合操作。
示例1:存在交易表stocks,以下查询语句按照苹果公司股票(股票代码APPL)的年份对股票记录进行分组,然后计算每年的平均收盘价
hive> select year(ymd),avg(price_close) from stocks where exchange = ‘NASDAQ‘ AND symbol ‘APPL‘ GROUP BY year(ymd);
1984 25.5786
1985 20.1936
1987 53.8456
示例2:hiving字句
hive> select year(ymd),avg(price_close) from stocks where exchange = ‘NASDAQ‘ AND symbol ‘APPL‘ GROUP BY year(ymd)
> having avg(price_close) > 50.0;
内连接中,只有进行连接的两个表中都存在与连接标准相匹配。
示例1:对苹果公司(APPL)和IBM公司(IBM)的股价进行比较。股票表stocks进行自连接,连接条件是ymd(year-month-day)相等。
hive> select a.ymd, a.price_close, b.price_close
> from stocks a join stocks b on a.ymd = b.ymd
> where a.symbol = ‘APPL‘ and b.symbol = ‘IBM‘;
2010-01-04 210.21 132.45
2010-01-05 214.41 130.26
2010-01-06 211.61 134.87
2010-01-07 212.01 133.45
# JOIN语句
hive> select s.ymd, s.symbol, s.price_close, d.dividend
> from stocks s join dividend d on s.ymd = d.ymd and s.symbol = d.symbol
> where s.symbol = ‘APPL‘;
1987-05-11 APPL 77.01 0.015
1987-08-10 APPL 48.61 0.015
1987-11-17 APPL 78.01 0.002
3张表的JOIN:
hive> select a.ymd, a.price_close, b.price_close,c.price_close
> from stocks a join stocks b on a.ymd = b.ymd
> join stocks c on a.ymd = c.ymd
> where a.symbol = ‘APPL‘ and b.symbol = ‘IBM‘ and c.symbol = ‘GE‘;
1987-05-11 APPL 77.01 0.015
1987-08-10 APPL 48.61 0.015
1987-11-17 APPL 78.01 0.002
尽量将大表放在前面
做外连接:join操作左边表中符合where子句的所有记录将会被返回。join右边表中如果没有符合on后面连接条件的记录时,那么
右表列的值将会是NULL。例:
hive> select s.ymd, s.symbol, s.price_close, d.dividend
> from stocks s left outer join dividend d on s.ymd = d.ymd and s.symbol = d.symbol
> where s.symbol = ‘APPL‘;
ORDER BY:全排序
SORT BY:只会在每个reducer中对数据进行排序,只进行局部排序,保证每个reducer输出的数据是有序的。例:
hive> select s.ymd, s.symbol, s.price_close
> from stocks s
> order|sort by s.ymd ASC, s.symbol DESC;
标签:des style io color ar os 使用 java for
原文地址:http://my.oschina.net/AlbertHa/blog/341372