码迷,mamicode.com
首页 > 数据库 > 详细

Oracle Bigdata Connector实战1: 使用Oracle Loader加载HDFS文件到Oracle数据库

时间:2015-12-11 01:20:42      阅读:924      评论:0      收藏:0      [点我收藏+]

标签:

  • 部署jdk/Hadoop/OraLoader软件包

将准备好的软件包,逐一解压到hadoop用户home目录下:

  1. hadoop-2.6.2.tar.gz jdk-8u65-linux-x64.gz oraloader-3.4.0.x86_64.zip

Hadoop软件部署如下:

  1. ├── hadoop-2.6.2
  2. ├── jdk1.8.0_65
  3. ├── oraloader-3.4.0-h2
  • 设置环境变量
  1. export JAVA_HOME=/home/hadoop/jdk1.8.0_65
  2. export HADOOP_USER_NAME=hadoop
  3. export HADOOP_HOME=/home/hadoop/hadoop-2.6.2
  4. export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
  5. export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
  6. export HADOOP_LIBEXEC_DIR=${HADOOP_HOME}/libexec
  7. export HADOOP_COMMON_HOME=${HADOOP_HOME}
  8. export HADOOP_HDFS_HOME=${HADOOP_HOME}
  9. export HADOOP_MAPRED_HOME=${HADOOP_HOME}
  10. export HADOOP_YARN_HOME=${HADOOP_HOME}
  11. export HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoop
  12. export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
  13. export OLH_HOME=/home/hadoop/oraloader-3.4.0-h2
  14. export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar:$OLH_HOME/jlib/*
  15. export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
  • 设置hadoop core-site.xml
  1. <configuration>
  2. <property>
  3. ????????<name>fs.defaultFS</name>
  4. ????????<value>hdfs://server1:8020</value>
  5. </property>
  6. </configuration>
  • 设置hadoop hdfs-site.xml
  1. <configuration>
  2. <property>
  3. ???<name>hadoop.tmp.dir</name>
  4. ???<value>file:///home/hadoop</value>
  5. </property>
  6. <property>
  7. ???<name>dfs.namenode.name.dir</name>
  8. ???<value>file:///home/hadoop/dfs/nn</value>
  9. </property>
  10. <property>
  11. ???<name>dfs.datanode.data.dir</name>
  12. ???<value>file:///home/hadoop/dfs/dn</value>
  13. </property>
  14. <property>
  15. ???<name>dfs.namenode.checkpoint.dir</name>
  16. ???<value>file:///home/hadoop/dfs/sn</value>
  17. </property>
  18. <property>
  19. ???<name>dfs.replication</name>
  20. ???<value>1</value>
  21. </property>
  22. <property>
  23. ???<name>dfs.permissions.superusergroup</name>
  24. ???<value>supergroup</value>
  25. </property>
  26. <property>
  27. ???<name>dfs.namenode.http-address</name>
  28. ???<value>server1:50070</value>
  29. </property>
  30. <property>
  31. ???<name>dfs.namenode.secondary.http-address</name>
  32. ???<value>server1:50090</value>
  33. </property>
  34. <property>
  35. ???<name>dfs.webhdfs.enabled</name>
  36. ???<value>true</value>
  37. </property>
  38. </configuration>
  • 设置hadoop yarn-site.xml
  1. <configuration>
  2. ???<property>
  3. ??????<name>yarn.resourcemanager.scheduler.address</name>
  4. ??????<value>server1:8030</value>
  5. ???</property>
  6. ???<property>
  7. ??????<name>yarn.resourcemanager.resource-tracker.address</name>
  8. ??????<value>server1:8031</value>
  9. ???</property>
  10. ???<property>
  11. ??????<name>yarn.resourcemanager.address</name>
  12. ??????<value>server1:8032</value>
  13. ???</property>
  14. ???<property>
  15. ??????<name>yarn.resourcemanager.admin.address</name>
  16. ??????<value>server1:8033</value>
  17. ???</property>
  18. ???<property>
  19. ??????<name>yarn.resourcemanager.webapp.address</name>
  20. ??????<value>server1:8088</value>
  21. ???</property>
  22. ???<property>
  23. ??????<name>yarn.nodemanager.local-dirs</name>
  24. ??????<value>file:///home/hadoop/yarn/local</value>
  25. ???</property>
  26. ???<property>
  27. ??????<name>yarn.nodemanager.log-dirs</name>
  28. ??????<value>file:///home/hadoop/yarn/logs</value>
  29. ???</property>
  30. ???<property>
  31. ??????<name>yarn.log-aggregation-enable</name>
  32. ??????<value>true</value>
  33. ???</property>
  34. ???<property>
  35. ??????<name>yarn.nodemanager.remote-app-log-dir</name>
  36. ??????<value>/yarn/apps</value>
  37. ???</property>
  38. ???<property>
  39. ??????<name>yarn.app.mapreduce.am.staging-dir</name>
  40. ??????<value>/user</value>
  41. ???</property>
  42. ???<property>
  43. ??????<name>yarn.nodemanager.aux-services</name>
  44. ??????<value>mapreduce_shuffle</value>
  45. ???</property>
  46. ???<property>
  47. ??????<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
  48. ??????<value>org.apache.hadoop.mapred.ShuffleHandler</value>
  49. ???</property>
  50. </configuration>
  • 设置OraLoader配置文件
  1. <?xml version="1.0" encoding="UTF-8" ?>
  2. <configuration>
  3. ?
  4. <!-- Input settings -->
  5. ??<property>
  6. ???<name>mapreduce.inputformat.class</name>
  7. ???<value>oracle.hadoop.loader.lib.input.DelimitedTextInputFormat</value>
  8. ?</property>
  9. ?
  10. ?<property>
  11. ???<name>mapred.input.dir</name>
  12. ???<value>/catalog</value>
  13. ?</property>
  14. ?
  15. ?<property>
  16. ???<name>oracle.hadoop.loader.input.fieldTerminator</name>
  17. ???<value>\u002C</value>
  18. ?</property>
  19. ?<property>
  20. ????<name>oracle.hadoop.loader.input.fieldNames</name>
  21. ???????<value>CATALOGID,JOURNAL,PUBLISHER,EDITION,TITLE,AUTHOR</value>
  22. ????????</property>
  23. ?
  24. <!-- Output settings -->
  25. ?<property>
  26. ???<name>mapreduce.job.outputformat.class</name>
  27. ???<value>oracle.hadoop.loader.lib.output.JDBCOutputFormat</value>
  28. ?</property>
  29. ?
  30. ?<property>
  31. ???<name>mapreduce.output.fileoutputformat.outputdir</name>
  32. ???<value>oraloadout</value>
  33. ?</property>
  34. ?
  35. <!-- Table information -->
  36. ?
  37. ?<property>
  38. ???<name>oracle.hadoop.loader.loaderMap.targetTable</name>
  39. ???<value>catalog</value>
  40. ?</property>
  41. ?
  42. ?
  43. ?
  44. <!-- Connection information -->
  45. ?
  46. <property>
  47. ??<name>oracle.hadoop.loader.connection.url</name>
  48. ??<value>jdbc:oracle:thin:@${HOST}:${TCPPORT}/${SERVICE_NAME}</value>
  49. </property>
  50. ?
  51. <property>
  52. ??<name>TCPPORT</name>
  53. ??<value>1521</value>
  54. </property>
  55. ?
  56. <property>
  57. ??<name>HOST</name>
  58. ??<value>192.168.56.101</value>
  59. </property>
  60. ?
  61. <property>
  62. ?<name>SERVICE_NAME</name>
  63. ?<value>orcl</value>
  64. </property>
  65. ?
  66. <property>
  67. ??<name>oracle.hadoop.loader.connection.user</name>
  68. ??<value>baron</value>
  69. </property>
  70. ?
  71. <property>
  72. ??<name>oracle.hadoop.loader.connection.password</name>
  73. ??<value>baron</value>
  74. </property>
  • 加载载测试数据
  1. $cat catalog.txt
  2. 1,Oracle Magazine,Oracle Publishing,Nov-Dec 2004,Database Resource Manager,Kimberly Floss
  3. 2,Oracle Magazine,Oracle Publishing,Nov-Dec 2004,From ADF UIX to JSF,Jonas Jacobi
  4. 3,Oracle Magazine,Oracle Publishing,March-April 2005,Starting with Oracle ADF,Steve Muench
  5. ?
  6. $ hdfs dfs -mkdir /catalog
  7. $ hdfs dfs -put catalog.txt /catalog/catalog.txt
  • 使用OraLoader加载HDFS文件到Oracle数据库
  1. hadoop jar $OLH_HOME/jlib/oraloader.jar oracle.hadoop.loader.OraLoader -conf OraLoadJobConf.xml -libjars $OLH_HOME/jlib/oraloader.jar

Oracle Loader for Hadoop Release 3.4.0 - Production

Copyright (c) 2011, 2015, Oracle and/or its affiliates. All rights reserved.

15/12/07 08:35:52 INFO loader.OraLoader: Oracle Loader for Hadoop Release 3.4.0 - Production

Copyright (c) 2011, 2015, Oracle and/or its affiliates. All rights reserved.

15/12/07 08:35:52 INFO loader.OraLoader: Built-Against: hadoop-2.2.0 hive-0.13.0 avro-1.7.3 jackson-1.8.8

15/12/07 08:35:52 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class

15/12/07 08:35:52 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir

15/12/07 08:36:27 INFO Configuration.deprecation: mapred.submit.replication is deprecated. Instead, use mapreduce.client.submit.file.replication

15/12/07 08:36:29 INFO loader.OraLoader: oracle.hadoop.loader.loadByPartition is disabled because table: CATALOG is not partitioned

15/12/07 08:36:29 INFO output.DBOutputFormat: Setting reduce tasks speculative execution to false for : oracle.hadoop.loader.lib.output.JDBCOutputFormat

15/12/07 08:36:29 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative

15/12/07 08:36:32 WARN loader.OraLoader: Sampler is disabled because the number of reduce tasks is less than two. Job will continue without sampled information.

15/12/07 08:36:32 INFO loader.OraLoader: Submitting OraLoader job OraLoader

15/12/07 08:36:32 INFO client.RMProxy: Connecting to ResourceManager at server1/192.168.56.101:8032

15/12/07 08:36:34 INFO input.FileInputFormat: Total input paths to process : 1

15/12/07 08:36:34 INFO mapreduce.JobSubmitter: number of splits:1

15/12/07 08:36:35 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1449494864827_0001

15/12/07 08:36:36 INFO impl.YarnClientImpl: Submitted application application_1449494864827_0001

15/12/07 08:36:37 INFO mapreduce.Job: The url to track the job: http://server1:8088/proxy/application_1449494864827_0001/

15/12/07 08:37:05 INFO loader.OraLoader: map 0% reduce 0%

15/12/07 08:37:22 INFO loader.OraLoader: map 100% reduce 0%

15/12/07 08:37:36 INFO loader.OraLoader: map 100% reduce 67%

15/12/07 08:38:05 INFO loader.OraLoader: map 100% reduce 100%

15/12/07 08:38:06 INFO loader.OraLoader: Job complete: OraLoader (job_1449494864827_0001)

15/12/07 08:38:06 INFO loader.OraLoader: Counters: 49

File System Counters

FILE: Number of bytes read=395

FILE: Number of bytes written=244157

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=367

HDFS: Number of bytes written=1861

HDFS: Number of read operations=7

HDFS: Number of large read operations=0

HDFS: Number of write operations=5

Job Counters

Launched map tasks=1

Launched reduce tasks=1

Data-local map tasks=1

Total time spent by all maps in occupied slots (ms)=12516

Total time spent by all reduces in occupied slots (ms)=40696

Total time spent by all map tasks (ms)=12516

Total time spent by all reduce tasks (ms)=40696

Total vcore-seconds taken by all map tasks=12516

Total vcore-seconds taken by all reduce tasks=40696

Total megabyte-seconds taken by all map tasks=12816384

Total megabyte-seconds taken by all reduce tasks=41672704

Map-Reduce Framework

Map input records=3

Map output records=3

Map output bytes=383

Map output materialized bytes=395

Input split bytes=104

Combine input records=0

Combine output records=0

Reduce input groups=1

Reduce shuffle bytes=395

Reduce input records=3

Reduce output records=3

Spilled Records=6

Shuffled Maps =1

Failed Shuffles=0

Merged Map outputs=1

GC time elapsed (ms)=556

CPU time spent (ms)=9450

Physical memory (bytes) snapshot=444141568

Virtual memory (bytes) snapshot=4221542400

Total committed heap usage (bytes)=331350016

Shuffle Errors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters

Bytes Read=263

File Output Format Counters

Bytes Written=1620

?

  • 加载完毕,验证加载结果

CATALOGID JOURNAL PUBLISHER EDITION TITLE AUTHOR

---------- ------------------------- ------------------------- ------------------------- ---------------------------------------- ---------------------

1 Oracle Magazine Oracle Publishing Nov-Dec 2004 Database Resource Manager Kimberly Floss

2 Oracle Magazine Oracle Publishing Nov-Dec 2004 From ADF UIX to JSF Jonas Jacobi

3 Oracle Magazine Oracle Publishing March-April 2005 Starting with Oracle ADF Steve Muench

Oracle Bigdata Connector实战1: 使用Oracle Loader加载HDFS文件到Oracle数据库

标签:

原文地址:http://www.cnblogs.com/panwenyu/p/5037716.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!