码迷,mamicode.com
首页 > 其他好文 > 详细

Hadoop集群搭建

时间:2018-11-29 15:02:16      阅读:167      评论:0      收藏:0      [点我收藏+]

标签:and   nod   登录   ras   col   cap   框架   上进   hdf   

官方主页: http://www.centos.org/

官方Wiki: http://wiki.centos.org/

官方中文文档 :http://wiki.centos.org/zh/Documentation

安装说明: http://www.centos.org/docs/

1.环境准备:

三台虚拟机 

版本 IP 主机名
centos7 192.168.136.129 master
centos7 192.168.136.130 slave1
centos7 192.168.136.131 slave2

 

Java包下载地址:https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

hadoop下载地址https://archive.apache.org/dist/hadoop/common/

2.开始搭建(先在master机上进行)

2.1关闭防火墙

[root@master ~]# cat /etc/redhat-release 
CentOS Linux release 7.5.1804 (Core) 
[root@master ~]# uname -r
3.10.0-862.el7.x86_64
[root@master ~]# sestatus
SELinux status:                 disabled
[root@master ~]# systemctl status firewalld.service
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
   Active: inactive (dead)
     Docs: man:firewalld(1)

#关闭防火墙,
#关闭selinux

2.2.创建hadoop用户

[root@master ~]# useradd hadoop
[root@master ~]# id hadoop
uid=1001(hadoop) gid=1001(hadoop) 组=1001(hadoop)
[root@master ~]# passwd hadoop
更改用户 hadoop 的密码 。
新的 密码:
无效的密码: 密码少于 8 个字符
重新输入新的 密码:
passwd:所有的身份验证令牌已经成功更新。

2.3配置Java环境

“”
创建jdk文件夹
将Java压缩包下载在jdk下,然后再jdk下解压
修改名字

“”

[root@master ~]# mkdir /usr/local/jdk
[root@master ~]# tar xf jdk-7u79-linux-x64.tar.gz 
[root@master ~]# mv jdk1.7.0_79/* /usr/local/jdk/

sed -i.ori ‘$a export JAVA_HOME=/usr/local/jdk\nexport PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH\nexport CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$JAVA_HOME/lib/tools.jar‘ /etc/profile
source /etc/profile


[root@master ~]# java -version
java version "1.7.0_79"
Java(TM) SE Runtime Environment (build 1.7.0_79-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)

2.4配置ssh免密登录

[root@master ~]# su - hadoop           //切换到hadoop用户下
[hadoop@master ~]$ ssh-keygen       //摁三次空格
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): 
/home/hadoop/.ssh/id_rsa already exists.
Overwrite (y/n)? 
[hadoop@master ~]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): 
/home/hadoop/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:7/AdrpSyYwm34ux0t5DIoUjvltJQh8ZeSZc0Bwzxn68 hadoop@master
The keys randomart image is:
+---[RSA 2048]----+
|      o=+o.      |
|      ..+o       |
|   . o o.        |
|    = +  . .     |
|  .+ o. S o      |
| ..o.o.o.o o     |
|  .oo.+o*o= o    |
|  ..+o..=X = .   |
|   o.o+o..E.o    |
+----[SHA256]-----+
#一直回车即可
#本地分发测试:
[hadoop@master .ssh]$ ssh-copy-id  hadoop@master
[hadoop@master .ssh]$ ssh hadoop@master

#分发到各slave节点
[hadoop@master ~]$ ssh-copy-id hadoop@slave1
[hadoop@master ~]$ ssh-copy-id hadoop@slave2
#自己登录测试一下即可。

2.5hadoop部署

[root@master ~]# tar xf hadoop-2.7.3.tar.gz 
[root@master ~]# mkdir /usr/local/hadoop
[root@master ~]# mv hadoop-2.7.3/* /usr/local/hadoop/
[root@master ~]# chown -R hadoop.hadoop /usr/local/hadoop

配置hadoop环境

[root@master ~]# su - hadoop
[hadoop@master ~]$ cat .bashrc 
# .bashrc
export JAVA_HOME=/usr/local/jdk
export HADOOP_HOME=/usr/local/hadoop
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:

source .bashrc

切换到hadoop目录下/etc/hadoop/修改配置文件

修改core-site.xml

cat /usr/local/hadoop/etc/hadoop/core-site.xml

<configuration>
 <property>
                <name>fs.defaultFS</name>
                <value>hdfs://master:9000</value>
        </property>
        <property>
                <name>io.file.buffer.size</name>
                <value>4096</value>
        </property>
</configuration>

注:

属性”fs.defaultFS“表示文件系统默认名称节点(即NameNode节点)地址,由”hdfs://主机名(或ip):端口号”组成;

属性“io.file.buffer.size”表示SequenceFiles在读写中可以使用的缓存大小,可减少I/O次数。

配置hadfs-site.xml

vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml 

<configuration>
   <property>
                <name>dfs.namenode.name.dir</name>
                <value>/usr/local/hadoop/dfs/name</value>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>/usr/local/hadoop/dfs/data</value>
        </property>
        <property>
                <name>dfs.replication</name>
                <value>2</value>
        </property>
        <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>master:50090</value>
        </property>
</configuration>

注:

属性”dfs.namenode.name.dir“表示NameNode存储命名空间和操作日志相关的元数据信息的本地文件系统目录,该项默认本地路径为”/tmp/hadoop-{username}/dfs/name“;

属性”dfs.datanode.data.dir“表示DataNode节点存储HDFS文件的本地文件系统目录,由”file://本地目录”组成,该项默认本地路径为”/tmp/hadoop-{username}/dfs/data”;

属性“dfs.replication”表示分布式文件系统的数据块复制份数,有几个datanode节点就复制几份;

属性“dfs.namenode.secondary.http-address”表示SecondNameNode主机及端口号(如果无需额外指定SecondNameNode角色,可以不进行此项配置)

 

配置mapred-site.xml

cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml

vim /usr/local/hadoop/etc/hadoop/mapred-site.xml


<configuration>
<property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
                <final>true</final>
        </property>
        <property>
        <name>mapreduce.jobhistory.address</name>
        <value>master:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>master:19888</value>
    </property>
</configuration>

注:

属性”mapreduce.framework.name“表示执行mapreduce任务所使用的运行框架,默认为local,需要将其改为”yarn”。

配置yarn-site.xml

vim /usr/local/hadoop/etc/hadoop/yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
 <property>
            <name>yarn.acl.enable</name>
               <value>false</value>
   </property>
   <property>
            <name>yarn.admin.acl</name>
                <value>*</value>
   </property>
    <property>
          <name>yarn.log-aggregation-enable</name>
                  <value>false</value>
    </property>
    <property>
     <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
            <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
              <name>yarn.resourcemanager.address</name>
                 <value>master:8032</value>
    </property>
    <property>
            <name>yarn.resourcemanager.scheduler.address</name>
                     <value>master:8030</value>
    </property>
    <property>
      <name>yarn.resourcemanager.resource-tracker.address</name>
                <value>master:8035</value>
    </property>
    <property>
            <name>yarn.resourcemanager.admin.address</name>
                   <value>master:8033</value>
    </property>
    <property>
          <name>yarn.resourcemanager.webapp.address</name>
                  <value>master:8088</value>
   </property>
  <property>
            <name>yarn.resourcemanager.hostname</name>
                  <value>master</value>
  </property>
   <property>
           <name>yarn.nodemanager.aux-services</name>
              <value>mapreduce_shuffle</value>
  </property>

</configuration>

注:

属性”yarn.nodemanager.aux-service“表示MR applicatons所使用的shuffle工具类 

指定JAVA_HOME安装目录

vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh


26  export JAVA_HOME=/usr/local/jdk

指定集群中的master节点(NameNode、ResourceManager)所拥有的slaver节点

vi /usr/local/hadoop/etc/hadoop/slaves

slave1
slave2

向slave复制Hadoop

#在两个slave节点执行
mkdir /usr/local/hadoop
chown -R hadoop.hadoop /usr/local/hadoop

#master推送hadoop内容
$ scp -r /usr/local/hadoop/* hadoop@slave1:/usr/local/hadoop/
$ scp -r /usr/local/hadoop/* hadoop@slave2:/usr/local/hadoop/

3.运行hadoop

3.1格式化分布式文件系统

[hadoop@master ~]$ hdfs namenode -format
18/11/06 07:02:23 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = master/10.0.0.200
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.7.3
STARTUP_MSG:   classpath = /usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop/share/hadoop/common/l
...........................
.........................
18/11/06 07:02:29 INFO util.ExitUtil: Exiting with status 0
18/11/06 07:02:29 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/10.0.0.200
************************************************************/

3.2 启动Hadoop

[hadoop@master ~]$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-namenode-master.out
slave2: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-slave2.out
slave1: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-slave1.out
Starting secondary namenodes [master]
master: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-resourcemanager-master.out
slave2: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-slave2.out
slave1: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-slave1.out
[hadoop@master ~]$ jps
ResourceManager
NameNode
Jps
SecondaryNameNode


[hadoop@slave2 ~]$ jps
NodeManager
Jps
DataNode

3.3 查看DFS使用情况:

[hadoop@master ~]$ hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 36477861888 (33.97 GB)
Present Capacity: 31194517504 (29.05 GB)
DFS Remaining: 31194509312 (29.05 GB)
DFS Used: 8192 (8 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (2):

Name: 10.0.0.211:50010 (slave1)
Hostname: slave1
Decommission Status : Normal
Configured Capacity: 18238930944 (16.99 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 2641661952 (2.46 GB)
DFS Remaining: 15597264896 (14.53 GB)
DFS Used%: 0.00%
DFS Remaining%: 85.52%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Tue Nov 06 07:08:45 CST 2018


Name: 10.0.0.212:50010 (slave2)
Hostname: slave2
Decommission Status : Normal
Configured Capacity: 18238930944 (16.99 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 2641682432 (2.46 GB)
DFS Remaining: 15597244416 (14.53 GB)
DFS Used%: 0.00%
DFS Remaining%: 85.52%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Tue Nov 06 07:08:44 CST 2018

至此整个hadoop集群环境搭建完毕。

4.测试验证

1)首先创建相关文件夹(要一步一步的创建)

$ hadoop dfs -mkdir /user
$ hadoop dfs -mkdir /user/hadoop
$ hadoop dfs -mkdir /user/hadoop/input

2) 建立测试文件

[hadoop@master ~]$ cat test.txt 

hello hadoop

hello World

Hello Java

CentOS System

3) 将测试文件放到测试目录中

 hadoop dfs -put test.txt /user/hadoop/input

4) 执行Wordcount程序

hadoop jar share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.7.3-sources.jar  org.apache.hadoop.examples.WordCount /user/hadoop/input /user/hadoop/output

5)   查看生成的单词统计数据

[hadoop@master hadoop]$  hadoop dfs -ls /user/hadoop/output
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Found 2 items
-rw-r--r--   2 hadoop supergroup          0 2018-11-06 07:26 /user/hadoop/output/_SUCCESS
-rw-r--r--   2 hadoop supergroup         58 2018-11-06 07:26 /user/hadoop/output/part-r-00000
[hadoop@master hadoop]$ hadoop dfs -cat /user/hadoop/output/part-r-00000
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

CentOS    1
Hello    1
Java    1
System    1
World    1
hadoop    1
hello    2

 

参考

http://blog.51cto.com/wangzhijian/1766619

https://www.cnblogs.com/sykblogs/articles/9936433.html

Hadoop集群搭建

标签:and   nod   登录   ras   col   cap   框架   上进   hdf   

原文地址:https://www.cnblogs.com/danhuanglianrong/p/10037292.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!