标签:hadoop 2.7.2 大数据 hdfs ha high availability 高可用 qjm nfs
hadoop官网关于HDFS HA包括两种方式:QJM(quorum journal manager) 和 NFS,本文档只考虑QJM一种方式的部署,NFS方式可以参考文档(HA with NFS)。HDFS HA(namenode)主要应对以下两种情况:
In the case of an unplanned event such as a machine crash, the cluster would be unavailable until an operator restarted the NameNode.
Planned maintenance events such as software or hardware upgrades on the NameNode machine would result in windows of cluster downtime.
关于HA的相关原理,有以下要点:
At any point in time, exactly one of the NameNodes is in an Active state, and the other is in aStandby state. The Active NameNode is responsible for all client operations in the cluster。
In order for the Standby node to keep its state synchronized with the Active node, both nodes communicate with a group of separate daemons called “JournalNodes” (JNs).
In the event of a failover, the Standby will ensure that it has read all of the edits from the JounalNodes before promoting itself to the Active state.
In order to ensure this property and prevent the so-called “split-brain scenario,” the JournalNodes will only ever allow a single NameNode to be a writer at a time.
JournalNode守护进程属于轻量级的,所以可以和NameNode、JobTracker、YARN ResourceManager部署在一台机器上。另外为了保证JournalNode容错,部署数量必须为奇数台数,并且最少必须部署 3 台,详细说明如下:
There must be at least 3 JournalNode daemons, since edit log modifications must be written to a majority of JNs. This will allow the system to tolerate the failure of a single machine. You may also run more than 3 JournalNodes, but in order to actually increase the number of failures the system can tolerate, you should run an odd number of JNs, (i.e. 3, 5, 7, etc.)
本文档在上一讲的基础上,基于三台虚拟机及已经搭建好的hadoop集群环境和zoopkeeper集群环境,在本章继续介绍HDFS HA环境的部署,具体步骤如下:
1、编辑hdfs-site.xml文件,下面是hdfs-site.xml文件的完整粘贴,蓝色字体部分为本次添加或修改的内容。
[hadoop@hadoop01 hadoop]$ vi hdfs-site.xml <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <!-- add start 20160712 --> <property> <name>dfs.nameservices</name> <value>mycluster</value> </property> <property> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn1</name> <value>hadoop01:8020</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn2</name> <value>hadoop02:8020</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn1</name> <value>hadoop01:50070</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn2</name> <value>hadoop02:50070</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://hadoop01:8485;hadoop02:8485;hadoop03:8485/mycluster</value> </property> <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hadoop/.ssh/id_dsa</value> </property> <!-- add end 20160712 --> <!-- add start by 20160623 --> <property> <name>dfs.replication</name> <!-- modify start 20160627 <value>1</value> --> <!-- modify start 20170712 <value>2</value>--> <value>3</value> <!-- modify end 20170712--> <!-- modify end 20160627 --> </property> <!-- add end by 20160623 --> <!-- add start by 20160627 --> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/hadoop/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/hadoop/dfs/data</value> </property> <!-- add end by 20160627 --> </configuration> |
2、编辑core-site.xml文件,以下为core-site.xml文件的完整粘贴,蓝色字体部分为本次修改的内容
[hadoop@hadoop01 hadoop]$ vi core-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <!--add start 20160623 --> <property> <name>fs.defaultFS</name> <!-- modify start 20160627 <value>hdfs://localhost:9000</value> --> <!--modify start 20160712 <value>hdfs://hadoop01:9000</value> --> <value>hdfs://mycluster</value> <!-- modify end 20160712 --> <!-- modify end --> </property> <!--add end 20160623 --> <!-- add start 20160712 --> <property> <name>dfs.journalnode.edits.dir</name> <value>/home/hadoop/dfs/journaldata</value> </property> <!-- add end 20160712 --> <!--add start by 20160627 --> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/home/hadoop/tmp</value> </property> <!--add end by 20160627 --> </configuration> |
3、分发配置文件到hadoop02、hadoop03
#因为虚拟环境中已经部署hadoop集群,为安装方便,配置一个新的HA集群,分别在三台机器上,以hadoop用户登录,执行以下命令: [hadoop@hadoop01 ~]$ rm -rf /home/hadoop/dfs/* [hadoop@hadoop02 ~]$ rm -rf /home/hadoop/dfs/* [hadoop@hadoop03 ~]$ rm -rf /home/hadoop/dfs/* #进入hadoop01服务器配置文件所在目录 [hadoop@hadoop01 hadoop]$ cd /home/hadoop/hadoop-2.7.2/etc/hadoop [hadoop@hadoop01 hadoop]$ scp * hadoop02:$PWD [hadoop@hadoop01 hadoop]$ scp * hadoop03:$PWD #在三台机器上,分别启动journalnode [hadoop@hadoop01 dfs]$ hadoop-daemon.sh start journalnode [hadoop@hadoop02 dfs]$ hadoop-daemon.sh start journalnode [hadoop@hadoop03 dfs]$ hadoop-daemon.sh start journalnode #在hadoop01机器上,格式化namenode [hadoop@hadoop01 dfs]$ hdfs namenode -format #在hadoop01机器上,启动主namenode [hadoop@hadoop01 ~]$ hadoop-daemon.sh start namenode #在hadoop02机器上启动standby节点 [hadoop@hadoop02 ~]$ hdfs namenode -bootstrapStandby #启动hadoop集群所有服务,在当前上下文中,该命令将启动所有剩余的服务 [hadoop@hadoop01 ~]$ start-all.sh #系统初始启动时,两个namenode服务均为Standby状态,在hadoop01机器上, #可以使用以下命令,将nn1节点切换为Active状态 [hadoop@hadoop01 dfs]$ hdfs haadmin -transitionToActive nn1 #执行以下命令,可以手动模拟故障转移,nn1变为Standby,nn2变为Active [hadoop@hadoop01 dfs]$ hdfs haadmin -failover nn1 nn2 |
4、停止所有服务
#执行以下命令,关闭hadoop集群HA模式下的所有服务 [hadoop@hadoop01 dfs]$ stop-all.sh Stopping namenodes on [hadoop01 hadoop02] hadoop02: stopping namenode hadoop01: stopping namenode hadoop01: stopping datanode hadoop03: stopping datanode hadoop02: stopping datanode Stopping journal nodes [hadoop01 hadoop02 hadoop03] hadoop02: stopping journalnode hadoop01: stopping journalnode hadoop03: stopping journalnode stopping yarn daemons stopping resourcemanager hadoop01: stopping nodemanager hadoop02: stopping nodemanager hadoop03: stopping nodemanager no proxyserver to stop |
5、启动所有服务
#执行以下命令,开启hadoop集群ha模式下的所有服务 [hadoop@hadoop01 dfs]$ start-all.sh Starting namenodes on [hadoop01 hadoop02] hadoop02: starting namenode, logging to hadoop-hadoop-namenode-hadoop02.out hadoop01: starting namenode, logging to hadoop-hadoop-namenode-hadoop01.out hadoop01: starting datanode, logging to hadoop-hadoop-datanode-hadoop01.out hadoop03: starting datanode, logging to hadoop-hadoop-datanode-hadoop03.out hadoop02: starting datanode, logging to hadoop-hadoop-datanode-hadoop02.out Starting journal nodes [hadoop01 hadoop02 hadoop03] hadoop02: starting journalnode, logging to hadoop-hadoop-journalnode-hadoop02.out hadoop01: starting journalnode, logging to hadoop-hadoop-journalnode-hadoop01.out hadoop03: starting journalnode, logging to hadoop-hadoop-journalnode-hadoop03.out starting yarn daemons starting resourcemanager, logging to yarn-hadoop-resourcemanager-hadoop01.out hadoop01: starting nodemanager, logging to yarn-hadoop-nodemanager-hadoop01.out hadoop02: starting nodemanager, logging to yarn-hadoop-nodemanager-hadoop02.out hadoop03: starting nodemanager, logging to yarn-hadoop-nodemanager-hadoop03.out |
6、查看hadoop服务相关的所有进程
[hadoop@hadoop01 dfs]$ jps 12144 NameNode 12468 JournalNode 12262 DataNode 12634 ResourceManager 12749 NodeManager [hadoop@hadoop02 ~]$ jps [hadoop@hadoop03 ~]$ jps |
7、截图展示如下:
本文出自 “沈进群” 博客,谢绝转载!
标签:hadoop 2.7.2 大数据 hdfs ha high availability 高可用 qjm nfs
原文地址:http://sjinqun.blog.51cto.com/8872791/1828642