标签:and nod 登录 ras col cap 框架 上进 hdf
官方主页: http://www.centos.org/
官方Wiki: http://wiki.centos.org/
官方中文文档 :http://wiki.centos.org/zh/Documentation
安装说明: http://www.centos.org/docs/
1.环境准备:
三台虚拟机
版本 | IP | 主机名 |
centos7 | 192.168.136.129 | master |
centos7 | 192.168.136.130 | slave1 |
centos7 | 192.168.136.131 | slave2 |
Java包下载地址:https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
hadoop下载地址:https://archive.apache.org/dist/hadoop/common/
2.开始搭建(先在master机上进行)
2.1关闭防火墙
[root@master ~]# cat /etc/redhat-release CentOS Linux release 7.5.1804 (Core) [root@master ~]# uname -r 3.10.0-862.el7.x86_64 [root@master ~]# sestatus SELinux status: disabled [root@master ~]# systemctl status firewalld.service ● firewalld.service - firewalld - dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled) Active: inactive (dead) Docs: man:firewalld(1) #关闭防火墙, #关闭selinux
2.2.创建hadoop用户
[root@master ~]# useradd hadoop [root@master ~]# id hadoop uid=1001(hadoop) gid=1001(hadoop) 组=1001(hadoop) [root@master ~]# passwd hadoop 更改用户 hadoop 的密码 。 新的 密码: 无效的密码: 密码少于 8 个字符 重新输入新的 密码: passwd:所有的身份验证令牌已经成功更新。
2.3配置Java环境
“” 创建jdk文件夹 将Java压缩包下载在jdk下,然后再jdk下解压 修改名字 “” [root@master ~]# mkdir /usr/local/jdk [root@master ~]# tar xf jdk-7u79-linux-x64.tar.gz [root@master ~]# mv jdk1.7.0_79/* /usr/local/jdk/ sed -i.ori ‘$a export JAVA_HOME=/usr/local/jdk\nexport PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH\nexport CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$JAVA_HOME/lib/tools.jar‘ /etc/profile source /etc/profile [root@master ~]# java -version java version "1.7.0_79" Java(TM) SE Runtime Environment (build 1.7.0_79-b15) Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)
2.4配置ssh免密登录
[root@master ~]# su - hadoop //切换到hadoop用户下 [hadoop@master ~]$ ssh-keygen //摁三次空格 Generating public/private rsa key pair. Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): /home/hadoop/.ssh/id_rsa already exists. Overwrite (y/n)? [hadoop@master ~]$ ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): /home/hadoop/.ssh/id_rsa already exists. Overwrite (y/n)? y Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/hadoop/.ssh/id_rsa. Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub. The key fingerprint is: SHA256:7/AdrpSyYwm34ux0t5DIoUjvltJQh8ZeSZc0Bwzxn68 hadoop@master The key‘s randomart image is: +---[RSA 2048]----+ | o=+o. | | ..+o | | . o o. | | = + . . | | .+ o. S o | | ..o.o.o.o o | | .oo.+o*o= o | | ..+o..=X = . | | o.o+o..E.o | +----[SHA256]-----+ #一直回车即可 #本地分发测试: [hadoop@master .ssh]$ ssh-copy-id hadoop@master [hadoop@master .ssh]$ ssh hadoop@master #分发到各slave节点 [hadoop@master ~]$ ssh-copy-id hadoop@slave1 [hadoop@master ~]$ ssh-copy-id hadoop@slave2 #自己登录测试一下即可。
2.5hadoop部署
[root@master ~]# tar xf hadoop-2.7.3.tar.gz [root@master ~]# mkdir /usr/local/hadoop [root@master ~]# mv hadoop-2.7.3/* /usr/local/hadoop/ [root@master ~]# chown -R hadoop.hadoop /usr/local/hadoop
配置hadoop环境
[root@master ~]# su - hadoop [hadoop@master ~]$ cat .bashrc # .bashrc export JAVA_HOME=/usr/local/jdk export HADOOP_HOME=/usr/local/hadoop export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar: export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin: source .bashrc
切换到hadoop目录下/etc/hadoop/修改配置文件
修改core-site.xml
cat /usr/local/hadoop/etc/hadoop/core-site.xml <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <property> <name>io.file.buffer.size</name> <value>4096</value> </property> </configuration>
注:
属性”fs.defaultFS“表示文件系统默认名称节点(即NameNode节点)地址,由”hdfs://主机名(或ip):端口号”组成;
属性“io.file.buffer.size”表示SequenceFiles在读写中可以使用的缓存大小,可减少I/O次数。
配置hadfs-site.xml
vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml <configuration> <property> <name>dfs.namenode.name.dir</name> <value>/usr/local/hadoop/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/usr/local/hadoop/dfs/data</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>master:50090</value> </property> </configuration>
注:
属性”dfs.namenode.name.dir“表示NameNode存储命名空间和操作日志相关的元数据信息的本地文件系统目录,该项默认本地路径为”/tmp/hadoop-{username}/dfs/name“;
属性”dfs.datanode.data.dir“表示DataNode节点存储HDFS文件的本地文件系统目录,由”file://本地目录”组成,该项默认本地路径为”/tmp/hadoop-{username}/dfs/data”;
属性“dfs.replication”表示分布式文件系统的数据块复制份数,有几个datanode节点就复制几份;
属性“dfs.namenode.secondary.http-address”表示SecondNameNode主机及端口号(如果无需额外指定SecondNameNode角色,可以不进行此项配置)
配置mapred-site.xml
cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml vim /usr/local/hadoop/etc/hadoop/mapred-site.xml <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> <final>true</final> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> </property> </configuration>
注:
属性”mapreduce.framework.name“表示执行mapreduce任务所使用的运行框架,默认为local,需要将其改为”yarn”。
配置yarn-site.xml
vim /usr/local/hadoop/etc/hadoop/yarn-site.xml <configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.acl.enable</name> <value>false</value> </property> <property> <name>yarn.admin.acl</name> <value>*</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>false</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8035</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
注:
属性”yarn.nodemanager.aux-service“表示MR applicatons所使用的shuffle工具类
指定JAVA_HOME安装目录
vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh 26 export JAVA_HOME=/usr/local/jdk
指定集群中的master节点(NameNode、ResourceManager)所拥有的slaver节点
vi /usr/local/hadoop/etc/hadoop/slaves
slave1
slave2
向slave复制Hadoop
#在两个slave节点执行 mkdir /usr/local/hadoop chown -R hadoop.hadoop /usr/local/hadoop #master推送hadoop内容 $ scp -r /usr/local/hadoop/* hadoop@slave1:/usr/local/hadoop/ $ scp -r /usr/local/hadoop/* hadoop@slave2:/usr/local/hadoop/
3.运行hadoop
3.1格式化分布式文件系统
[hadoop@master ~]$ hdfs namenode -format 18/11/06 07:02:23 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = master/10.0.0.200 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 2.7.3 STARTUP_MSG: classpath = /usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop/share/hadoop/common/l ........................... ......................... 18/11/06 07:02:29 INFO util.ExitUtil: Exiting with status 0 18/11/06 07:02:29 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at master/10.0.0.200 ************************************************************/
[hadoop@master ~]$ start-all.sh This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh Starting namenodes on [master] master: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-namenode-master.out slave2: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-slave2.out slave1: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-slave1.out Starting secondary namenodes [master] master: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out starting yarn daemons starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-resourcemanager-master.out slave2: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-slave2.out slave1: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-slave1.out [hadoop@master ~]$ jps ResourceManager NameNode Jps SecondaryNameNode [hadoop@slave2 ~]$ jps NodeManager Jps DataNode
[hadoop@master ~]$ hadoop dfsadmin -report DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Configured Capacity: 36477861888 (33.97 GB) Present Capacity: 31194517504 (29.05 GB) DFS Remaining: 31194509312 (29.05 GB) DFS Used: 8192 (8 KB) DFS Used%: 0.00% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 ------------------------------------------------- Live datanodes (2): Name: 10.0.0.211:50010 (slave1) Hostname: slave1 Decommission Status : Normal Configured Capacity: 18238930944 (16.99 GB) DFS Used: 4096 (4 KB) Non DFS Used: 2641661952 (2.46 GB) DFS Remaining: 15597264896 (14.53 GB) DFS Used%: 0.00% DFS Remaining%: 85.52% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Tue Nov 06 07:08:45 CST 2018 Name: 10.0.0.212:50010 (slave2) Hostname: slave2 Decommission Status : Normal Configured Capacity: 18238930944 (16.99 GB) DFS Used: 4096 (4 KB) Non DFS Used: 2641682432 (2.46 GB) DFS Remaining: 15597244416 (14.53 GB) DFS Used%: 0.00% DFS Remaining%: 85.52% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Tue Nov 06 07:08:44 CST 2018
至此整个hadoop集群环境搭建完毕。
4.测试验证
1)首先创建相关文件夹(要一步一步的创建)
$ hadoop dfs -mkdir /user $ hadoop dfs -mkdir /user/hadoop $ hadoop dfs -mkdir /user/hadoop/input
2) 建立测试文件
[hadoop@master ~]$ cat test.txt
hello hadoop
hello World
Hello Java
CentOS System
3) 将测试文件放到测试目录中
hadoop dfs -put test.txt /user/hadoop/input
4) 执行Wordcount程序
hadoop jar share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.7.3-sources.jar org.apache.hadoop.examples.WordCount /user/hadoop/input /user/hadoop/output
5) 查看生成的单词统计数据
[hadoop@master hadoop]$ hadoop dfs -ls /user/hadoop/output DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Found 2 items -rw-r--r-- 2 hadoop supergroup 0 2018-11-06 07:26 /user/hadoop/output/_SUCCESS -rw-r--r-- 2 hadoop supergroup 58 2018-11-06 07:26 /user/hadoop/output/part-r-00000 [hadoop@master hadoop]$ hadoop dfs -cat /user/hadoop/output/part-r-00000 DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. CentOS 1 Hello 1 Java 1 System 1 World 1 hadoop 1 hello 2
参考
http://blog.51cto.com/wangzhijian/1766619
https://www.cnblogs.com/sykblogs/articles/9936433.html
标签:and nod 登录 ras col cap 框架 上进 hdf
原文地址:https://www.cnblogs.com/danhuanglianrong/p/10037292.html