标签:
基于QJN的HA模式的分布式部署,不含Federation模块的实践是一个经典的Hadoop2的高可用的分布式部署模式。
ip | hostname | namenode | fc | datanode | rm | nodemanage | QJN |
---|---|---|---|---|---|---|---|
10.71.84.237 | hadoop201 | Y | Y | Y | Y | Y | Y |
10.71.84.223 | hadoop202 | Y | Y | Y | Y | Y | Y |
10.71.84.222 | hadoop203 | N | N | Y | N | Y | Y |
10.71.84.238 | hadoop204 | N | N | Y | N | Y | N |
ip | hostname |
---|---|
10.71.83.14 | hadoop10 |
10.71.84.16 | hadoop12 |
10.71.84.17 | hadoop13 |
安装centos6.4 x64的操作系统,安装过程略
安装成功后初始化每台机器的系统配置
#增加hadoop组
groupadd -g 4000 hadoop
#增加hadoop用户
useradd -g hadoop -c "hadoopuser" -p 111111 -u 3001 -m hadoop -d /home/hadoop
#初始化hadoop用户的密码
passwd hadoop
#创建hadoop集群的数据与计算的应用目录
mkdir /app/hadoop
chown hadoop:hadoop /app/hadoop
#利用root用户安装emacs工具
yum install -y emacs
#修改机器名称,根据不同的机器修改为不同的机器名
hostname hadoop10
emacs -nw /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=hadoop201
emacs -nw /etc/hosts
10.71.84.237 hadoop201
10.71.84.223 hadoop202
10.71.84.222 hadoop203
10.71.84.238 hadoop204
10.71.83.14 hadoop10
10.71.83.16 hadoop12
10.71.83.17 hadoop13
emacs -nw /etc/security/limits.d/90-nproc.conf 增加下面的内容
* soft nproc 1024
hadoop soft nproc 25535
hadoop hard nproc 65535
emacs -nw /etc/sysctl.conf 增加下面的内容
fs.file-max = 655350
设置同步的时间服务器
1. >yum install -y ntp
2.emacs -nw /etc/ntp.conf
>注释掉
>#server 0.centos.pool.ntp.org iburst
>#server 1.centos.pool.ntp.org iburst
>#server 2.centos.pool.ntp.org iburst
>#server 3.centos.pool.ntp.org iburst
>增加国家授权时间服务器(北京邮电大学提供)
>server s1a.time.edu.cn
3.chkconfig --level 345 ntpd on
4.ntpdate s1a.time.edu.cn
5.service ntpd start
$ emacs -nw /home/hadoop/.bash_profile
export JAVA_HOME=/app/hadoop/java/jdk1.6.0_38
export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export HADOOP_HOME=/app/hadoop/hadoop/hadoop-2.5.2
export PATH=/usr/sbin:$PATH
export PATH=$HADOOP_HOME/bin:$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH:$HOMR/bin:$PATH
(apache提供的版本为32位版本,请下载此64位版本)
涉及机器: hadoop201 hadoop202
配置方法:略
最终效果:hadoop201与hadoop202之间可以相互免登陆
core-site.xml
(/app/hadoop/hadoop/hadoop-2.5.2/etc/hadoop/core-site.xml)<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://cluster1</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/hadoop/tmp</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop10:2181,hadoop12:2181,hadoop13:2181</value>
</property>
hdfs-site.xml
(/app/hadoop/hadoop/hadoop-2.5.2/etc/hadoop/hdfs-site.xml)<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>cluster1</value>
</property>
<property>
<name>dfs.ha.namenodes.cluster1</name>
<value>hadoop201,hadoop202</value>
</property>
<property>
<name>dfs.namenode.rpc-address.cluster1.hadoop201</name>
<value>hadoop201:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.cluster1.hadoop201</name>
<value>hadoop201:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.cluster1.hadoop202</name>
<value>hadoop202:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.cluster1.hadoop202</name>
<value>hadoop202:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop201:8485;hadoop202:8485;hadoop203:8485/cluster1</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled.cluster1</name>
<value>true</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.cluster1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/app/hadoop/hadoop/tmp/journal</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
</configuration>
mapred-site.xml
(/app/hadoop/hadoop/hadoop-2.5.2/etc/hadoop/mapred-site.xml)<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml
(/app/hadoop/hadoop/hadoop-2.5.2/etc/hadoop/yarn-site.xml)<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop201</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop202</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop10:2181,hadoop12:2181,hadoop13:2181</value>
<description>For multiple zk services, separate them with comma</description>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-ha</value>
</property>
</configuration>
涉及的机器为:hadoop201 hadoop202 hadoop203
步骤:hadoop用户登陆每台机器,使用 nohup /app/hadoop/hadoop/hadoop-2.5.2/sbin/hadoop-daemon.sh start journalnode &
涉及的机器为:hadoop201 hadoop202
步骤:
1.通过hadoop账户登陆hadoop201。执行 /app/hadoop/hadoop/hadoop-2.5.2/bin/hdfs zkfc –formatZK
2.继续在hadoop201这台机器上面,执行 nohup /app/hadoop/hadoop/hadoop-2.5.2/sbin/hadoop-daemon.sh start namenode &
3.通过hadoop账户登陆hadoop202。执行 /app/hadoop/hadoop/hadoop-2.5.2/bin/hdfs namenode -bootstrapStandby
4.继续在hadoop202上面,执行 nohup /app/hadoop/hadoop/hadoop-2.5.2/sbin/hadoop-daemon.sh start namenode &
最后通过jps命令查看进程状态,检查logs目录下面相关NN的log是否有异常。确保NN进程已经正常启动
涉及的机器为:hadoop201 hadoop202
步骤:
1.通过hadoop账户登陆hadoop201。执行 nohup /app/hadoop/hadoop/hadoop-2.5.2/sbin/hadoop-daemon.sh start zkfc &
2.通过hadoop账户登陆hadoop202。执行 nohup /app/hadoop/hadoop/hadoop-2.5.2/sbin/hadoop-daemon.sh start zkfc &
最后通过jps命令查看进程状态,检查logs目录下面相关FC的log是否有异常。确保FC进程已经正常启动。此时,2个NN节点将由一个是active状态,另外一个是standby状态。
涉及的机器为:hadoop201 hadoop202 hadoop203 hadoop204
步骤:
1.通过hadoop账户登陆每一台机器。执行 nohup /app/hadoop/hadoop/hadoop-2.5.2/sbin/hadoop-daemon.sh start datanode &
最后通过jps命令查看进程状态,检查logs目录下面相关DN的log是否有异常。确保DN进程已经正常启动。
涉及的机器为:hadoop201 hadoop202
步骤:
1.通过hadoop账户登陆hadoop201。执行 nohup /app/hadoop/hadoop/hadoop-2.5.2/bin/yarn resourcemanager &
2.通过hadoop账户登陆hadoop202。执行 nohup /app/hadoop/hadoop/hadoop-2.5.2/bin/yarn resourcemanager &
最后通过jps命令查看进程状态,检查logs目录下面相关rm的log是否有异常。确保Yarn的rm进程已经正常启动。
涉及的机器为:hadoop201 hadoop202 hadoop203 hadoop204
步骤:
1.通过hadoop账户登陆每一台机器。执行 nohup /app/hadoop/hadoop/hadoop-2.5.2/bin/yarn nodemanager &
最后通过jps命令查看进程状态,检查logs目录下面相关nm的log是否有异常。确保Yarn的nm进程已经正常启动。
我们设计了两个维度的测试矩阵:系统失效方式,客户端连接模型
终止NameNode进程:
ZKFC主动释放锁 模拟机器OOM、死锁、硬件性能骤降等故障
NN机器掉电:
ZK锁超时 模拟网络和交换机故障、以及掉电本身
已连接的客户端(持续拷贝96M的文件,1M每块)
通过增加块的数目,我们希望客户端会不断的向NN去申请新的块;一般是在第一个文件快结束或第二个文件刚开始拷贝的时候使系统失效。
新发起连接的客户端(持续拷贝96M的文件,100M每块) 因为只有一个块,所以在实际拷贝过程中失效并不会立刻导致客户端或DN报错,但下一次新发起连接的客户端会一开始就没有NN可连;一般是在第一个文件快结束拷贝时使系统失效。 针对每一种组合,我们反复测试10-30次,每次拷贝5个文件进入HDFS,因为时间不一定掐的很准,所以有时候也会是在第三或第四个文件的时候才使系统失效,不管如何,我们会在结束后从HDFS里取出所有文件,并挨个检查文件MD5,以确保数据的完整性。
ZKFC主动释放锁
5-8秒切换(需同步edits)
客户端偶尔会有重试(~10%)
但从未失败
ZK锁超时
15-20s切换(超时设置为10s)
客户端重试几率变大(~75%)
且偶有失败(~15%),但仅见于已连接客户端
可确保数据完整性
MD5校验从未出错 +失败时客户端有Exception
Hadoop 2.0里的HDFS HA基本可满足高可用性
Hadoop2提供了NN与Rm的HA模式,为生产系统提供了高可用的保证,而且Yarn对多种分布式计算提供更为高效的资源管理,官方也是建议大家升级到hadoop2。笔者认为hadoop2的最大优势就是yarn的资源管理,先进的管理模式使后续很多的分布式应用程序,例如:spark、storm与samza等都是基于yarn。
访问链接:
NN(hadoop201)节点访问链接 http://10.71.84.237:50070/dfshealth.htm
NN(hadoop202)节点访问链接 http://10.71.84.237:50070/dfshealth.htm
Yarn(hadoop201)RM访问链接 http://10.71.84.237:8088/cluster
标签:
原文地址:http://www.cnblogs.com/fengjian2016/p/5965242.html