码迷,mamicode.com
首页 > 数据库 > 详细

apache sqoop1.99.3+hadoop2.5.2+mysql5.0.7环境构筑以及数据导入导出

时间:2016-05-12 12:04:30      阅读:364      评论:0      收藏:0      [点我收藏+]

标签:

概要

 为了调查hadoop生态圈里的制品,特地的了解了一下RDBMS和hdfs之间数据的导入和导出工具,并且调查了一些其他同类的产品,得出来的结论是:都是基于sqoop做的二次开发或者说是webUI包装,实质还是用的sqoop。比如pentaho的PDI,Oracle的ODI,都是基于此,另外,Hortnetwork公司的sandbox,Hue公司的Hue webUI,coulder的coulder manger,做个就更不错了,差不多hadoop下的制品都集成了,部署也不是很复杂,还是很强大的。


关于sqoop

apache sqoop现阶段分了2个系列制品,一个是sqoop1系列的,另一个是sqoop2系列的。相比较,sqoop1相对比较成熟,bug较少,但结构比较单一,现阶段的稳定版是1.4.6;sqoop2系列基于sqoop1的基础上,做了很大的改进,client跟server端分离,job跟connection做到了集成化管理,使用方面来看,比sqoop1简单多了,但部署比较复杂,且sqoop1不能跟sqoop2兼容,既存的一些应用脚本几乎要重写,但大的趋势来看,sqoop2将会变成主流。

#sqoop2从1.99.2以后,就没法将数据导入到hbase中,这一点,以后预定会在sqoop2.0.0这个稳定版中解决掉。


环境搭建

环境搭建依据官网的提示,这里着重说一下需要注意的是事项:

1.server/conf/sqoop.properties文件中需要修改的地方

  org.apache.sqoop.repository.jdbc.url=jdbc:derby:@BASEDIR@/repository/sqoop;create=true

  这里的sqoop是事先在mysql这边创建的数据库,并赋予了权限:

  create database sqoop ;

  create user sqoop identified by '123456';

  grant privileges on sqoop.* to sqoop;

  flush privileges;

2.也是在sqoop.properties文件中修改hadoop的位置

 org.apache.sqoop.Submission.engine.mapreduce.configuration.directory=your-hadoop-cluster-location


3.server/conf/catalina.properties文件中,追加hadoop/share下的所有lib文件。

common.loader=${Catalina.base}/lib,${CAtalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar,${catalina.home}/../lib/*.jar,your-hadoop-libs


4.【重要】修改hadoop的yarn-site.xml文件,追加如下信息:

<property>
     <name>yarn.nodemanager.aux-services</name>
     <value>mapreduce_shuffle</value>
</property>
<property>
     <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
     <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>


测试

hadoop跟sqoop环境启动。

1.启动hadoop start-all.sh脚本

2.启动sqoop

1.sqoop server以demaon启动后,会有如下信息:

[root@sv001 sqoop-1.99.3-bin-hadoop200]# ./bin/sqoop.sh server run
Sqoop home directory: /home/project/sqoop-1.99.3-bin-hadoop200
Setting SQOOP_HTTP_PORT:     12000
Setting SQOOP_ADMIN_PORT:     12001
Using   CATALINA_OPTS:       
Adding to CATALINA_OPTS:    -Dsqoop.http.port=12000 -Dsqoop.admin.port=12001
Using CATALINA_BASE:   /home/project/sqoop-1.99.3-bin-hadoop200/server
Using CATALINA_HOME:   /home/project/sqoop-1.99.3-bin-hadoop200/server
Using CATALINA_TMPDIR: /home/project/sqoop-1.99.3-bin-hadoop200/server/temp
Using JRE_HOME:        /usr/java/jdk1.7.0_67
Using CLASSPATH:       /home/project/sqoop-1.99.3-bin-hadoop200/server/bin/bootstrap.jar
May 11, 2016 6:56:00 PM org.apache.catalina.core.AprLifecycleListener init
INFO: The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: /usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
May 11, 2016 6:56:00 PM org.apache.coyote.http11.Http11Protocol init
INFO: Initializing Coyote HTTP/1.1 on http-12000
May 11, 2016 6:56:00 PM org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 634 ms
May 11, 2016 6:56:00 PM org.apache.catalina.core.StandardService start
INFO: Starting service Catalina
May 11, 2016 6:56:00 PM org.apache.catalina.core.StandardEngine start
INFO: Starting Servlet Engine: Apache Tomcat/6.0.36
May 11, 2016 6:56:00 PM org.apache.catalina.startup.HostConfig deployWAR
INFO: Deploying web application archive sqoop.war
2016-05-11 18:56:00,972 INFO  [main] core.SqoopServer (SqoopServer.java:initialize(47)) - Booting up Sqoop server
2016-05-11 18:56:00,979 INFO  [main] core.PropertiesConfigurationProvider (PropertiesConfigurationProvider.java:initialize(96)) - Starting config file poller thread
log4j: Parsing for [root] with value=[WARN, file].
log4j: Level token is [WARN].
log4j: Category root set to WARN
log4j: Parsing appender named "file".
log4j: Parsing layout options for "file".
log4j: Setting property [conversionPattern] to [%d{ISO8601} %-5p %c{2} [%l] %m%n].
log4j: End of parsing for "file".
log4j: Setting property [file] to [@LOGDIR@/sqoop.log].
log4j: Setting property [maxBackupIndex] to [5].
log4j: Setting property [maxFileSize] to [25MB].
log4j: setFile called: @LOGDIR@/sqoop.log, true
log4j: setFile ended
log4j: Parsed "file" options.
log4j: Parsing for [org.apache.sqoop] with value=[DEBUG].
log4j: Level token is [DEBUG].
log4j: Category org.apache.sqoop set to DEBUG
log4j: Handling log4j.additivity.org.apache.sqoop=[null]
log4j: Parsing for [org.apache.derby] with value=[INFO].
log4j: Level token is [INFO].
log4j: Category org.apache.derby set to INFO
log4j: Handling log4j.additivity.org.apache.derby=[null]
log4j: Finished configuring.
log4j: Could not find root logger information. Is this OK?
log4j: Parsing for [default] with value=[INFO,defaultAppender].
log4j: Level token is [INFO].
log4j: Category default set to INFO
log4j: Parsing appender named "defaultAppender".
log4j: Parsing layout options for "defaultAppender".
log4j: Setting property [conversionPattern] to [%d %-5p %c: %m%n].
log4j: End of parsing for "defaultAppender".
log4j: Setting property [file] to [@LOGDIR@/default.audit].
log4j: setFile called: @LOGDIR@/default.audit, true
log4j: setFile ended
log4j: Parsed "defaultAppender" options.
log4j: Handling log4j.additivity.default=[null]
log4j: Finished configuring.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/project/sqoop-1.99.3-bin-hadoop200/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/project/sqoop-1.99.3-bin-hadoop200/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/project/hadoop-2.5.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
May 11, 2016 6:56:03 PM org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deploying web application directory ROOT
May 11, 2016 6:56:03 PM org.apache.coyote.http11.Http11Protocol start
INFO: Starting Coyote HTTP/1.1 on http-12000
May 11, 2016 6:56:03 PM org.apache.catalina.startup.Catalina start
INFO: Server startup in 3605 ms

使用jps命令查看的话,会有一个bootstrap的进程,这个就可以证明sqoop server启动成功。


2.启动sqoop client

  命令:sqoop.sh client

[root@sv001 sqoop-1.99.3-bin-hadoop200]# ./bin/sqoop.sh client
Sqoop home directory: /home/project/sqoop-1.99.3-bin-hadoop200
Sqoop Shell: Type 'help' or '\h' for help.

sqoop:000>


3.测试准备以及实施

确认版本信息

sqoop:000> show version -all
client version:
  Sqoop 1.99.3 revision 2404393160301df16a94716a3034e31b03e27b0b 
  Compiled by mengweid on Fri Oct 18 14:15:53 EDT 2013
server version:
  Sqoop 1.99.3 revision 2404393160301df16a94716a3034e31b03e27b0b 
  Compiled by mengweid on Fri Oct 18 14:15:53 EDT 2013
Protocol version:
  [1]

创建server,面向web UI

set server --host localhost --port 12000 --webapp sqoop

sqoop:000> set server --host localhost --port 12000 --webapp sqoop
Server is set successfully

创建connection,成功的话如下:

sqoop:000> create connection --cid 2
Creating connection for connector with id 2
Exception has occurred during processing command 
Exception: org.apache.sqoop.common.SqoopException Message: CLIENT_0001:Server has returned exception
sqoop:000> create connection --cid 1
Creating connection for connector with id 1
Please fill following values to create new connection object
Name: test-mysql2hdfs

Connection configuration

JDBC Driver Class: com.mysql.jdbc.Driver
JDBC Connection String: jdbc:mysql://<strong>your-mysql-ip</strong>:3306/<strong>sqoop</strong>
Username: <strong>sqoop</strong>
Password: <strong>******</strong>
JDBC Connection Properties: 
There are currently 0 values in the map:
entry# 

Security related configuration options

Max connections: 10
New connection was successfully created with validation status FINE and persistent id 6

#加粗的部分要注意,是事前在mysql出准备的,且sqoop数据库跟sqoop.properties里的也要保持一致。

此处创建成功的connection是id=6


根据创建成功的connection来创建job【mysql-->HDFS】,如下

sqoop:000> create job --xid 6 --type import
Creating job for connection with id 6
Please fill following values to create new job object
Name: importmysql2hdfs

Database configuration

Schema name: sqoop
Table name: t1
Table SQL statement: 
Table column names: 
Partition column name: id
Nulls in partition column: 
Boundary query: 

Output configuration

Storage type: 
  0 : HDFS
Choose: 0
Output format: 
  0 : TEXT_FILE
  1 : SEQUENCE_FILE
Choose: 0
Compression format: 
  0 : NONE
  1 : DEFAULT
  2 : DEFLATE
  3 : GZIP
  4 : BZIP2
  5 : LZO
  6 : LZ4
  7 : SNAPPY
Choose: 0
Output directory: /sqoopuse

Throttling resources

Extractors: 
Loaders: 
New job was successfully created with validation status FINE  and persistent id 4

注意:表定义信息以及数据也是在mysql侧事前准备的。

mysql> select * from t1;
+------+---------+----------+
| id   | int_col | char_col |
+------+---------+----------+
|    2 |       2 | b        |
|    4 |       4 | d        |
|    1 |       1 | a        |
|    3 |       3 | c        |
+------+---------+----------+
4 rows in set (0.00 sec)

导入到hdfs的数据,存储在【Output directory: /sqoopuse】

且job的id=4


创建job【hdfs-->mysql】

sqoop:000> create job --xid 4 --type export
Creating job for connection with id 4
Please fill following values to create new job object
Name: hdfs2mysqlInfo

Database configuration

Schema name: sqoop
Table name: t1
Table SQL statement: 
Table column names: 
Stage table name: 
Clear stage table: 

Input configuration

Input directory: /sqoopuse

Throttling resources

Extractors: 
Loaders: 
New job was successfully created with validation status FINE  and persistent id 11


4.测试实施

1.启动job【mysql-->hdfs】

sqoop:000> start job --jid 4
Submission details
Job ID: 4
Server URL: http://localhost:12000/sqoop/
Created by: root
Creation date: 2016-05-11 19:19:53 JST
Lastly updated by: root
External ID: job_1462962692840_0001
        http://sv004:8088/proxy/application_1462962692840_0001/
2016-05-11 19:19:53 JST: BOOTING  - Progress is not available

2.sleep30s,查看job状态

sqoop:000> status job --jid 4
Submission details
Job ID: 4
Server URL: http://localhost:12000/sqoop/
Created by: root
Creation date: 2016-05-11 19:37:16 JST
Lastly updated by: root
External ID: job_1462962692840_0001
        http://sv004:8088/proxy/application_1462962692840_0001/
2016-05-11 19:37:57 JST: SUCCEEDED 
Counters:
        org.apache.hadoop.mapreduce.JobCounter
                SLOTS_MILLIS_MAPS: 38212
                MB_MILLIS_MAPS: 39129088
                TOTAL_LAUNCHED_MAPS: 3
                MILLIS_MAPS: 38212
                VCORES_MILLIS_MAPS: 38212
                SLOTS_MILLIS_REDUCES: 0
                OTHER_LOCAL_MAPS: 3
        org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
                BYTES_READ: 0
        org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter
                BYTES_WRITTEN: 32
        org.apache.hadoop.mapreduce.TaskCounter
                MAP_INPUT_RECORDS: 0
                MERGED_MAP_OUTPUTS: 0
                PHYSICAL_MEMORY_BYTES: 497262592
                SPILLED_RECORDS: 0
                FAILED_SHUFFLE: 0
                CPU_MILLISECONDS: 3520
                COMMITTED_HEAP_BYTES: 603979776
                VIRTUAL_MEMORY_BYTES: 2741444608
                MAP_OUTPUT_RECORDS: 4
                SPLIT_RAW_BYTES: 346
                GC_TIME_MILLIS: 96
        org.apache.hadoop.mapreduce.FileSystemCounter
                FILE_READ_OPS: 0
                FILE_WRITE_OPS: 0
                FILE_BYTES_READ: 0
                FILE_LARGE_READ_OPS: 0
                HDFS_BYTES_READ: 346
                FILE_BYTES_WRITTEN: 318117
                HDFS_LARGE_READ_OPS: 0
                HDFS_BYTES_WRITTEN: 32
                HDFS_READ_OPS: 12
                HDFS_WRITE_OPS: 6
        org.apache.sqoop.submission.counter.SqoopCounters
                ROWS_READ: 4
Job executed successfully

3.查看出力在hdfs存储的位置文件

[root@sv001 bin]# ./hadoop fs -ls /sqoopuse
16/05/11 19:43:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 4 items
-rw-r--r--   3 root supergroup          0 2016-05-11 19:37 /sqoopuse/_SUCCESS
-rw-r--r--   3 root supergroup          8 2016-05-11 19:37 /sqoopuse/part-m-00000
-rw-r--r--   3 root supergroup          8 2016-05-11 19:37 /sqoopuse/part-m-00001
-rw-r--r--   3 root supergroup         16 2016-05-11 19:37 /sqoopuse/part-m-00002

4.确认导入 数据

[root@sv001 bin]# ./hadoop fs -cat /sqoopuse/part*
16/05/11 19:43:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
1,1,'a'
2,2,'b'
4,4,'d'
3,3,'c'

这个信息跟mysql中的数据是一致的,可以证明数据在导入的过程中是没有发生丢失的。


测试job【hdfs-->mysql】

1.清空mysql侧的数据

mysql> select * from t1;
+------+---------+----------+
| id   | int_col | char_col |
+------+---------+----------+
|    2 |       2 | b        |
|    4 |       4 | d        |
|    1 |       1 | a        |
|    3 |       3 | c        |
+------+---------+----------+
4 rows in set (0.00 sec)

mysql> delete from t1;
Query OK, 4 rows affected (0.27 sec)

mysql> select * from t1;
Empty set (0.00 sec)

mysql>

2.启动job【hdfs-->mysql】

sqoop:000> start job --jid 11
Submission details
Job ID: 11
Server URL: http://localhost:12000/sqoop/
Created by: root
Creation date: 2016-05-11 19:50:42 JST
Lastly updated by: root
External ID: job_1462962692840_0002
        http://sv004:8088/proxy/application_1462962692840_0002/
2016-05-11 19:50:42 JST: BOOTING  - Progress is not available

3.查看job运行状态

sqoop:000> status job --jid 11
Submission details
Job ID: 11
Server URL: http://localhost:12000/sqoop/
Created by: root
Creation date: 2016-05-11 19:50:42 JST
Lastly updated by: root
External ID: job_1462962692840_0002
        http://sv004:8088/proxy/application_1462962692840_0002/
2016-05-11 19:51:39 JST: SUCCEEDED 
Counters:
        org.apache.hadoop.mapreduce.JobCounter
                SLOTS_MILLIS_MAPS: 204363
                MB_MILLIS_MAPS: 209267712
                TOTAL_LAUNCHED_MAPS: 8
                MILLIS_MAPS: 204363
                VCORES_MILLIS_MAPS: 204363
                SLOTS_MILLIS_REDUCES: 0
                OTHER_LOCAL_MAPS: 8
        org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter
                BYTES_WRITTEN: 0
        org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
                BYTES_READ: 0
        org.apache.hadoop.mapreduce.TaskCounter
                MAP_INPUT_RECORDS: 0
                MERGED_MAP_OUTPUTS: 0
                PHYSICAL_MEMORY_BYTES: 1327665152
                SPILLED_RECORDS: 0
                COMMITTED_HEAP_BYTES: 1610612736
                CPU_MILLISECONDS: 7590
                FAILED_SHUFFLE: 0
                VIRTUAL_MEMORY_BYTES: 7262990336
                SPLIT_RAW_BYTES: 1224
                MAP_OUTPUT_RECORDS: 4
                GC_TIME_MILLIS: 316
        org.apache.hadoop.mapreduce.FileSystemCounter
                FILE_WRITE_OPS: 0
                FILE_READ_OPS: 0
                FILE_LARGE_READ_OPS: 0
                FILE_BYTES_READ: 0
                HDFS_BYTES_READ: 1320
                FILE_BYTES_WRITTEN: 839664
                HDFS_LARGE_READ_OPS: 0
                HDFS_WRITE_OPS: 0
                HDFS_READ_OPS: 32
                HDFS_BYTES_WRITTEN: 0
        org.apache.sqoop.submission.counter.SqoopCounters
                ROWS_READ: 4
Job executed successfully

4.mysql客户端确认是否已经成功导出,且数据是否有丢失

mysql> select * from t1;
+------+---------+----------+
| id   | int_col | char_col |
+------+---------+----------+
|    2 |       2 | b        |
|    4 |       4 | d        |
|    1 |       1 | a        |
|    3 |       3 | c        |
+------+---------+----------+
4 rows in set (0.00 sec)

mysql> delete from t1;
Query OK, 4 rows affected (0.27 sec)

mysql> select * from t1;
Empty set (0.00 sec)

mysql> select * from t1;                <--------select确认
+------+---------+----------+
| id   | int_col | char_col |
+------+---------+----------+
|    1 |       1 | a        |
|    2 |       2 | b        |
|    4 |       4 | d        |
|    3 |       3 | c        |
+------+---------+----------+
4 rows in set (0.00 sec)

导出成功,且没有发生数据丢失。


---over----

apache sqoop1.99.3+hadoop2.5.2+mysql5.0.7环境构筑以及数据导入导出

标签:

原文地址:http://blog.csdn.net/huyangshu87/article/details/51372495

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!