码迷,mamicode.com
首页 > 数据库 > 详细

连接Oracle与Hadoop(3) 使用OLH加载Hbase到Oracle

时间:2015-12-19 06:34:45      阅读:517      评论:0      收藏:0      [点我收藏+]

标签:

OLHOracle Loader for Hadoop的缩写Oracle出品的大数据连接器的一个组件

技术分享

本文介绍的就是如何使用OLH加载Hbase表到Oracle数据库。

  • 前提:已部署Hadoop/Hive/Hbase与OLH软件
  1. [hadoop@server1 ~]$ tree -L 1
  2. ├── hadoop-2.6.2
  3. ├── hbase-1.1.2
  4. ├── hive-1.1.1
  5. ├── jdk1.8.0_65
  6. ├── oraloader-3.4.0

 

  • 创建Hbase表
  1. create ‘WLSSERVER‘ , ‘LOG‘
  2. put ‘WLSSERVER‘, ‘row1‘, ‘LOG:TIME_STAMP‘, ‘Apr-8-2014-7:06:16-PM-PDT‘
  3. put ‘WLSSERVER‘, ‘row1‘, ‘LOG:CATEGORY‘, ‘Notice‘
  4. put ‘WLSSERVER‘, ‘row1‘, ‘LOG:TYPE‘, ‘WebLogicServer‘
  5. put ‘WLSSERVER‘, ‘row1‘, ‘LOG:SERVERNAME‘, ‘AdminServer‘
  6. put ‘WLSSERVER‘, ‘row1‘, ‘LOG:CODE‘, ‘BEA-000365‘
  7. put ‘WLSSERVER‘, ‘row1‘, ‘LOG:MSG‘, ‘Server state changed to STANDBY‘
  8. put ‘WLSSERVER‘, ‘row2‘, ‘LOG:TIME_STAMP‘, ‘Apr-8-2014-7:06:17-PM-PDT‘
  9. put ‘WLSSERVER‘, ‘row2‘, ‘LOG:CATEGORY‘, ‘Notice‘
  10. put ‘WLSSERVER‘, ‘row2‘, ‘LOG:TYPE‘, ‘WebLogicServer‘
  11. put ‘WLSSERVER‘, ‘row2‘, ‘LOG:SERVERNAME‘, ‘AdminServer‘
  12. put ‘WLSSERVER‘, ‘row2‘, ‘LOG:CODE‘, ‘BEA-000365‘
  13. put ‘WLSSERVER‘, ‘row2‘, ‘LOG:MSG‘, ‘Server state changed to STARTING‘
  14. put ‘WLSSERVER‘, ‘row3‘, ‘LOG:TIME_STAMP‘, ‘Apr-8-2014-7:06:18-PM-PDT‘
  15. put ‘WLSSERVER‘, ‘row3‘, ‘LOG:CATEGORY‘, ‘Notice‘
  16. put ‘WLSSERVER‘, ‘row3‘, ‘LOG:TYPE‘, ‘WebLogicServer‘
  17. put ‘WLSSERVER‘, ‘row3‘, ‘LOG:SERVERNAME‘, ‘AdminServer‘
  18. put ‘WLSSERVER‘, ‘row3‘, ‘LOG:CODE‘, ‘BEA-000365‘
  19. put ‘WLSSERVER‘, ‘row3‘, ‘LOG:MSG‘, ‘Server state changed to ADMIN‘
  20. put ‘WLSSERVER‘, ‘row4‘, ‘LOG:TIME_STAMP‘, ‘Apr-8-2014-7:06:19-PM-PDT‘
  21. put ‘WLSSERVER‘, ‘row4‘, ‘LOG:CATEGORY‘, ‘Notice‘
  22. put ‘WLSSERVER‘, ‘row4‘, ‘LOG:TYPE‘, ‘WebLogicServer‘
  23. put ‘WLSSERVER‘, ‘row4‘, ‘LOG:SERVERNAME‘, ‘AdminServer‘
  24. put ‘WLSSERVER‘, ‘row4‘, ‘LOG:CODE‘, ‘BEA-000365‘
  25. put ‘WLSSERVER‘, ‘row4‘, ‘LOG:MSG‘, ‘Server state changed to RESUMING‘
  26. put ‘WLSSERVER‘, ‘row5‘, ‘LOG:TIME_STAMP‘, ‘Apr-8-2014-7:06:20-PM-PDT‘
  27. put ‘WLSSERVER‘, ‘row5‘, ‘LOG:CATEGORY‘, ‘Notice‘
  28. put ‘WLSSERVER‘, ‘row5‘, ‘LOG:TYPE‘, ‘WebLogicServer‘
  29. put ‘WLSSERVER‘, ‘row5‘, ‘LOG:SERVERNAME‘, ‘AdminServer‘
  30. put ‘WLSSERVER‘, ‘row5‘, ‘LOG:CODE‘, ‘BEA-000331‘
  31. put ‘WLSSERVER‘, ‘row5‘, ‘LOG:MSG‘, ‘Started WebLogic AdminServer‘
  32. put ‘WLSSERVER‘, ‘row6‘, ‘LOG:TIME_STAMP‘, ‘Apr-8-2014-7:06:21-PM-PDT‘
  33. put ‘WLSSERVER‘, ‘row6‘, ‘LOG:CATEGORY‘, ‘Notice‘
  34. put ‘WLSSERVER‘, ‘row6‘, ‘LOG:TYPE‘, ‘WebLogicServer‘
  35. put ‘WLSSERVER‘, ‘row6‘, ‘LOG:SERVERNAME‘, ‘AdminServer‘
  36. put ‘WLSSERVER‘, ‘row6‘, ‘LOG:CODE‘, ‘BEA-000365‘
  37. put ‘WLSSERVER‘, ‘row6‘, ‘LOG:MSG‘, ‘Server state changed to RUNNING‘
  38. put ‘WLSSERVER‘, ‘row7‘, ‘LOG:TIME_STAMP‘, ‘Apr-8-2014-7:06:22-PM-PDT‘
  39. put ‘WLSSERVER‘, ‘row7‘, ‘LOG:CATEGORY‘, ‘Notice‘
  40. put ‘WLSSERVER‘, ‘row7‘, ‘LOG:TYPE‘, ‘WebLogicServer‘
  41. put ‘WLSSERVER‘, ‘row7‘, ‘LOG:SERVERNAME‘, ‘AdminServer‘
  42. put ‘WLSSERVER‘, ‘row7‘, ‘LOG:CODE‘, ‘BEA-000360‘
  43. put ‘WLSSERVER‘, ‘row7‘, ‘LOG:MSG‘, ‘Server started in RUNNING mode

 

  • 创建Hive-on-Hbase表
  1. CREATE EXTERNAL TABLE wlsserver_hbase
  2. (
  3. key string,
  4. TIME_STAMP string,
  5. CATEGORY string,
  6. TYPE string,
  7. SERVERNAME string,
  8. CODE string,
  9. MSG string
  10. )
  11. STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler‘
  12. WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,LOG:TIME_STAMP,LOG:CATEGORY,LOG:TYPE,LOG:SERVERNAME,LOG:CODE,LOG:MSG")
  13. TBLPROPERTIES("hbase.table.name" = "WLSSERVER");
  • 创建Oracle导入目标表
  1. CREATE TABLE wlsserver
  2.   ( key varchar2(4000),
  3.     time_stamp VARCHAR2(4000),
  4.     category VARCHAR2(4000),
  5.     type VARCHAR2(4000),
  6.     servername VARCHAR2(4000),
  7.     code VARCHAR2(4000),
  8.     msg VARCHAR2(4000)
  9.   );

 

  • 配置Oracle Loader的配置文件
  1. <?xml version="1.0" encoding="UTF-8" ?>
  2. <configuration>
  3.  
  4.    <!-- Input settings -->
  5.    <property>
  6.       <name>mapreduce.inputformat.class</name>
  7.       <value>oracle.hadoop.loader.lib.input.HiveToAvroInputFormat</value>
  8.    </property>
  9.    <property>
  10.       <name>oracle.hadoop.loader.input.hive.databaseName</name>
  11.       <value>default</value>
  12.    </property>
  13.    <property>
  14.       <name>oracle.hadoop.loader.input.hive.tableName</name>
  15.       <value>wlsserver_hbase</value>
  16.    </property>
  17.  
  18.    <!-- Output settings -->
  19.    <property>
  20.       <name>mapreduce.job.outputformat.class</name>
  21.       <value>oracle.hadoop.loader.lib.output.JDBCOutputFormat</value>
  22.    </property>
  23.    <property>
  24.       <name>mapreduce.output.fileoutputformat.outputdir</name>
  25.       <value>oraloadout</value>
  26.    </property>
  27.  
  28.    <!-- Table information -->
  29.    <property>
  30.       <name>oracle.hadoop.loader.loaderMap.targetTable</name>
  31.       <value>WLSSERVER</value>
  32.    </property>
  33.  
  34.    <!-- Connection information -->
  35.    <property>
  36.       <name>oracle.hadoop.loader.connection.url</name>
  37.       <value>jdbc:oracle:thin:@${HOST}:${TCPPORT}:${SID}</value>
  38.    </property>
  39.    <property>
  40.       <name>TCPPORT</name>
  41.       <value>1521</value>
  42.    </property>
  43.    <property>
  44.       <name>HOST</name>
  45.       <value>server1</value>
  46.    </property>
  47.    <property>
  48.       <name>SID</name>
  49.       <value>orcl</value>
  50.    </property>
  51.    <property>
  52.       <name>oracle.hadoop.loader.connection.user</name>
  53.       <value>baron</value>
  54.    </property>
  55.    <property>
  56.       <name>oracle.hadoop.loader.connection.password</name>
  57.       <value>baron</value>
  58.    </property>
  59. </configuration>

 

  • 使用OLH加载 Hbase表到Oracle数据库

这里需要注意两点:

  1. export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$OLH_HOME/jlib/*:$HIVE_HOME/lib/*:$HIVE_CONF_DIR
  2. hadoop jar $OLH_HOME/jlib/oraloader.jar oracle.hadoop.loader.OraLoader -conf OraLoadJobConf-hive.xml -libjars $OLH_HOME/jlib/oraloader.jar,$HIVE_HOME/lib/hive-exec-1.1.1.jar,$HIVE_HOME/lib/hive-metastore-1.1.1.jar,$HIVE_HOME/lib/libfb303-0.9.2.jar

输出结果如下:

 

Oracle Loader for Hadoop Release 3.4.0 - Production

Copyright (c) 2011, 2015, Oracle and/or its affiliates. All rights reserved.

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.6.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/home/hadoop/hive-1.1.1/lib/hive-jdbc-1.1.1-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

15/12/08 04:53:51 INFO loader.OraLoader: Oracle Loader for Hadoop Release 3.4.0 - Production

Copyright (c) 2011, 2015, Oracle and/or its affiliates. All rights reserved.

15/12/08 04:53:51 INFO loader.OraLoader: Built-Against: hadoop-2.2.0 hive-0.13.0 avro-1.7.3 jackson-1.8.8

15/12/08 04:53:51 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class

15/12/08 04:53:51 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir

15/12/08 04:54:23 INFO Configuration.deprecation: mapred.submit.replication is deprecated. Instead, use mapreduce.client.submit.file.replication

15/12/08 04:54:24 INFO loader.OraLoader: oracle.hadoop.loader.loadByPartition is disabled because table: CATALOG is not partitioned

15/12/08 04:54:24 INFO output.DBOutputFormat: Setting reduce tasks speculative execution to false for : oracle.hadoop.loader.lib.output.JDBCOutputFormat

15/12/08 04:54:24 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative

15/12/08 04:54:26 WARN loader.OraLoader: Sampler is disabled because the number of reduce tasks is less than two. Job will continue without sampled information.

15/12/08 04:54:26 INFO loader.OraLoader: Submitting OraLoader job OraLoader

15/12/08 04:54:26 INFO client.RMProxy: Connecting to ResourceManager at server1/192.168.56.101:8032

15/12/08 04:54:28 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore

15/12/08 04:54:28 INFO metastore.ObjectStore: ObjectStore, initialize called

15/12/08 04:54:29 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored

15/12/08 04:54:29 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored

15/12/08 04:54:31 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"

15/12/08 04:54:33 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.

15/12/08 04:54:33 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.

15/12/08 04:54:34 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.

15/12/08 04:54:34 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.

15/12/08 04:54:34 INFO DataNucleus.Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing

15/12/08 04:54:34 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL

15/12/08 04:54:34 INFO metastore.ObjectStore: Initialized ObjectStore

15/12/08 04:54:34 INFO metastore.HiveMetaStore: Added admin role in metastore

15/12/08 04:54:34 INFO metastore.HiveMetaStore: Added public role in metastore

15/12/08 04:54:35 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty

15/12/08 04:54:35 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=catalog

15/12/08 04:54:35 INFO HiveMetaStore.audit: ugi=hadoop ip=unknown-ip-addr cmd=get_table : db=default tbl=catalog

15/12/08 04:54:36 INFO mapred.FileInputFormat: Total input paths to process : 1

15/12/08 04:54:36 INFO metastore.HiveMetaStore: 0: Shutting down the object store...

15/12/08 04:54:36 INFO HiveMetaStore.audit: ugi=hadoop ip=unknown-ip-addr cmd=Shutting down the object store...

15/12/08 04:54:36 INFO metastore.HiveMetaStore: 0: Metastore shutdown complete.

15/12/08 04:54:36 INFO HiveMetaStore.audit: ugi=hadoop ip=unknown-ip-addr cmd=Metastore shutdown complete.

15/12/08 04:54:37 INFO mapreduce.JobSubmitter: number of splits:2

15/12/08 04:54:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1449544601730_0015

15/12/08 04:54:38 INFO impl.YarnClientImpl: Submitted application application_1449544601730_0015

15/12/08 04:54:38 INFO mapreduce.Job: The url to track the job: http://server1:8088/proxy/application_1449544601730_0015/

15/12/08 04:54:49 INFO loader.OraLoader: map 0% reduce 0%

15/12/08 04:55:07 INFO loader.OraLoader: map 100% reduce 0%

15/12/08 04:55:22 INFO loader.OraLoader: map 100% reduce 67%

15/12/08 04:55:47 INFO loader.OraLoader: map 100% reduce 100%

15/12/08 04:55:47 INFO loader.OraLoader: Job complete: OraLoader (job_1449544601730_0015)

15/12/08 04:55:47 INFO loader.OraLoader: Counters: 49

File System Counters

FILE: Number of bytes read=395

FILE: Number of bytes written=370110

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=6005

HDFS: Number of bytes written=1861

HDFS: Number of read operations=9

HDFS: Number of large read operations=0

HDFS: Number of write operations=5

Job Counters

Launched map tasks=2

Launched reduce tasks=1

Data-local map tasks=2

Total time spent by all maps in occupied slots (ms)=29809

Total time spent by all reduces in occupied slots (ms)=36328

Total time spent by all map tasks (ms)=29809

Total time spent by all reduce tasks (ms)=36328

Total vcore-seconds taken by all map tasks=29809

Total vcore-seconds taken by all reduce tasks=36328

Total megabyte-seconds taken by all map tasks=30524416

Total megabyte-seconds taken by all reduce tasks=37199872

Map-Reduce Framework

Map input records=3

Map output records=3

Map output bytes=383

Map output materialized bytes=401

Input split bytes=5610

Combine input records=0

Combine output records=0

Reduce input groups=1

Reduce shuffle bytes=401

Reduce input records=3

Reduce output records=3

Spilled Records=6

Shuffled Maps =2

Failed Shuffles=0

Merged Map outputs=2

GC time elapsed (ms)=1245

CPU time spent (ms)=14220

Physical memory (bytes) snapshot=757501952

Virtual memory (bytes) snapshot=6360301568

Total committed heap usage (bytes)=535298048

Shuffle Errors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=1620

连接Oracle与Hadoop(3) 使用OLH加载Hbase到Oracle

标签:

原文地址:http://www.cnblogs.com/panwenyu/p/5058590.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!