hadoop的生态圈:
ETL
数据的抽取、转换、加载
1)安装hive
1.解压
# tar zvxf hive-0.13.0.tar.gz -C /usr/local # cd /usr/local # ln -sv /usr/local/hive-0.12.0 /usr/local/hive2.替换jar包,保持hbase0.98与hadoop1.2一致
# cd /usr/hive/lib # rm -rf hbase-0.94* # find /usr/hbase/lib -name "hbase*.jar"|xargs -i cp {} ./重点检查下zookeeper和protobuf的jar包是否和hbase保持一致,如果不一致,拷贝protobuf.**.jar和zookeeper-3.4.5.jar到hive/lib下。
3.准备mysql当元数据库
拷贝mysql的jdbc jar包mysql-connector-java-5.1.10-bin.jar也拷贝到hive-0.12.0/lib下,拷贝完成后修改权限:
# chmod 777 mysql-connector-java-5.1.10-bin.jar
4.检查hbase与hive连接包是否存在,如果不存在拷贝一个到lib下
# cd /usr/hive/lib # find -name hive-hbase-handler* 5.配置mysql
安装mysql:
# yum install -y mysql mysql-server
安装完成后启动mysql:
# service mysqld start# mysqlmysql相关操作:
①修改root密码(可自己根据情况修改):
# update user set password=password(‘hadoop‘) where user=‘root‘;
②建立数据库hive
# create database hive;
③授权
# grant all on hive.* to ‘root‘@‘%‘ identified by ‘hadoop‘; # flush privileges;6.修改hive-site.xml配置
<property> <name>hive.aux.jars.path</name> <value>file:///usr/local/hive/lib/hive-hbase-handler-0.13.1.jar,file:///usr/local/hbase/lib/protobuf-java-2.5.0.jar,file:///usr/local/hbase/lib/hbase-client-0.98.6.1-hadoop1.jar,file:///usr/local/hive/lib/hive-common-0.13.1.jar,file:///usr/local/hive/lib/zookeeper-3.4.5.jar</value> </property> <property> <name>hive.metastore.uris</name> <value>thrift://10.15.62.228:9083</value> <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description> </property> <property> <name>javax.jdo.option.ConnectionURL</name> <!--<value>jdbc:derby:;databaseName=metastore_db;create=true</value>--> <value>jdbc:mysql://10.15.62.228:3306/hive??createDatabaseIfNotExist=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <!--<value>org.apache.derby.jdbc.EmbeddedDriver</value>--> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive_user</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>mysql密码</value> <description>password to use against metastore database</description> </property>
7.在HDFS上新建目录并授权
# hadoop fs -mkdir /hive/warehouse
# hadoop fs -mkdir /hive/scratchdir
# hadoop fs -chmod g+w /hive/warehouse # hadoop fs -chmod g+w /hive/scratchdir
这里创建的目录名对应配置文件中的选项:
8.修改其他配置:
①修改hadoop中的hadoop-env.sh
# vim /usr/hadoop/etc/hadoop/hadoop-env.sh
修改如下:
②修改hive中配置
# cd /usr/hive/conf # mv hive-default.xml.template hive-default.xml # mv hive-env.sh.template hive-env.sh # cd /usr/hive/bin # vim hive-config.sh修改内容如下:
(2)启动hive(元数据库启动后会卡住,放着不管,开启开启另一个终端即可)
1.启动hadoop和hbase
2.启动hive元数据库
# cd /usr/hive/bin # hive --service metastore -hiveconf hive.root.logger=DEBUG,console3.在新开的终端启动hive
# cd /usr/hive/bin # hive
(3)测试
分别启动hive和hbase shell
hive> CREATE TABLE pokes (foo INT, bar STRING);
OK
Time taken: 6.294 seconds
hive> show tables;
OK
pokes
Time taken: 0.131 seconds, Fetched: 1 row(s)
.在hive中创建表,语句如下:
hive> CREATE TABLE hbase_table_1(key int, value string) STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler‘ WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("hbase.table.name" = "xyz");
OK
Time taken: 63.667 seconds
创建之后分别查看hbase和hive中的变化:
hive> show tables; OK
hbase_table_1
pokes
Time taken: 0.718 seconds, Fetched: 2 row(s)
hive> select * from hbase_table_1;
OK
1 abc
Time taken: 32.543 seconds, Fetched: 1 row(s)
hbase(main):047:0> list
TABLE
xyz
1 row(s) in 17.3310 seconds
=> ["xyz"]
hbase(main):048:0> put ‘xyz‘,‘1‘,‘cf1:val‘,‘abc‘
0 row(s) in 7.9590 seconds
hbase(main):049:0> scan ‘xyz‘
ROW COLUMN+CELL
1 column=cf1:val, timestamp=1413433021803, value=abc
1 row(s) in 1.1650 seconds
(4)配置hive的web接口:
apache-hive-0.13.1没有war包,使用hive-0.12.0的hive-hwi-0.12.0.war
# cp /usr/local/hive-0.12.0/lib/hive-hwi-0.12.0.war /usr/local/apache-hive-0.13.1-bin/lib/
# chown hadoop.hadoop /usr/local/apache-hive-0.13.1-bin/lib/hive-hwi-0.12.0.war
修改hive-site.xml
<property>
<name>hive.hwi.war.file</name>
<!--<value>lib/hive-hwi-@VERSION@.war</value>-->
<value>lib/hive-hwi-0.12.0.war</value>
<description>This sets the path to the HWI war file, relative to ${HIVE_HOME}. </description>
</property>
$ hive --service hwi &hive的hwi界面:
hadoop:
至此说明hive和hbase整合成
本文出自 “Linux之旅” 博客,请务必保留此出处http://openlinuxfly.blog.51cto.com/7120723/1688797
hadoop三----基于hadoop伪分布式整合hive和hbase
原文地址:http://openlinuxfly.blog.51cto.com/7120723/1688797