001-Hadoop分布式集群安装

时间：2016-04-08 14:59:26 阅读：244 评论：0 收藏：0 [点我收藏+]

标签：

1. 通过VMware Workstation安装一台Ubuntu操作系统的虚拟机（Master.Hadoop）

2. 配置虚拟机IP地址和Hostname

2.1 配置IP：

root@Master:~# cat /etc/network/interfaces

# This file describes the network interfaces available on your system

# and how to activate them. For more information, see interfaces(6).

# The loopback network interface

auto lo

iface lo inet loopback

# The primary network interface

auto eth0

iface eth0 inet static

address 192.168.142.141

netmask 255.255.255.0

gateway 192.168.142.1

root@Master:~#

2.2 配置Hostname：

root@Master:~# cat /etc/hostname

Master.Hadoop

3. 创建一个hadoop用户组：

3.1 root用户执行：groupadd hadoop

4. 新建一个hadoop用户：

4.1 root用户执行：useradd -s /bin/bash -d /home/hadoop -m hadoop -g hadoop

5. 安装JDK并配置环境变量

5.1 下载Linux版本的JDK包，上传到/opt/java下

5.2 解压安装包，得到解压后目录：/opt/java/jdk1.7.0_79

5.3 环境变量配置：(追加配置)

vi /etc/profile

...

export JAVA_HOME=/opt/java/jdk1.7.0_79

export HADOOP_HOME=/home/hadoop/hadoop-2.6.4

export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:

6. 复制虚拟机，在VMware中打开并修改IP地址和Hostname，形成三台虚拟Ubuntu节点

6.1 Master.Hadoop：192.168.142.141

6.2 Slave1.Hadoop：192.168.142.142

6.3 Slave2.Hadoop：192.168.142.143

6.4 修改IP地址：vi /etc/network/interfaces

6.5 修改Hostname：vi /etc/hostname

7. 配置三台虚拟机的hosts文件（将三台虚拟机Hostname映射加入）

root@Master:~# cat /etc/hosts

127.0.0.1 localhost

127.0.1.1 ubuntu-cc.localdomain ubuntu-cc

192.168.142.141 Master.Hadoop

192.168.142.142 Slave1.Hadoop

192.168.142.143 Slave2.Hadoop

8. 实现ssh免密钥登录

8.1 在Master的hadoop用户下：ssh-keygen -t dsa -P ‘‘ -f ~/.ssh/id_dsa

8.2 在Master的hadoop用户下：cp id_dsa.pub authorized_keys

8.3 在Slave1和Slave2分别执行：ssh-keygen -t dsa -P ‘‘ -f ~/.ssh/id_dsa

8.4 将Master的id_dsa.pub追加到Slave1和Slave2的authorized_keys文件中

8.5 测试免密钥登录：ssh Slave1.Hadoop

9. 下载最新Hadoop安装包：https://hadoop.apache.org/releases.html

10. 将下载后的hadoop-2.6.4.tar.gz上传到hadoop用户的根目录：/home/hadoop；并解压

11. 到/home/hadoop/hadoop-2.6.4/etc/hadoop目录下，开始配置Hadoop

11.1 vi core-site.xml

 1  <?xml version="1.0" encoding="UTF-8"?>
 2         <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 3         <!--
 4           Licensed under the Apache License, Version 2.0 (the "License");
 5           you may not use this file except in compliance with the License.
 6           You may obtain a copy of the License at
 7  
 8             http://www.apache.org/licenses/LICENSE-2.0
 9  
10           Unless required by applicable law or agreed to in writing, software
11           distributed under the License is distributed on an "AS IS" BASIS,
12           WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13           See the License for the specific language governing permissions and
14           limitations under the License. See accompanying LICENSE file.
15         -->
16  
17         <!-- Put site-specific property overrides in this file. -->
18  
19         <configuration>
20                 <property>
21                         <name>hadoop.tmp.dir</name>
22                         <value>/home/hadoop/tmp</value>
23                         <description>Abase for other temporary directories.</description>
24                 </property>
25                 <property>
26                         <name>fs.defaultFS</name>
27                         <value>hdfs://Master.Hadoop:9000</value>
28                 </property>
29                 <property>
30                         <name>io.file.buffer.size</name>
31                         <value>4096</value>
32                 </property>
33         </configuration>

11.2 vi hadoop-env.sh和yarn-env.sh，在开头添加如下环境变量（必须配置绝对路径，否则启动时会报 JAVA_HOME is not set and could not be found.）

export JAVA_HOME=/opt/java/jdk1.7.0_79

11.3 vi hdfs-site.xml

 1 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 2         <!--
 3           Licensed under the Apache License, Version 2.0 (the "License");
 4           you may not use this file except in compliance with the License.
 5           You may obtain a copy of the License at
 6  
 7             http://www.apache.org/licenses/LICENSE-2.0
 8  
 9           Unless required by applicable law or agreed to in writing, software
10           distributed under the License is distributed on an "AS IS" BASIS,
11           WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12           See the License for the specific language governing permissions and
13           limitations under the License. See accompanying LICENSE file.
14         -->
15  
16         <!-- Put site-specific property overrides in this file. -->
17  
18         <configuration>
19                 <property>
20                         <name>dfs.namenode.name.dir</name>
21                         <value>file:///home/hadoop/dfs/name</value>
22                 </property>
23                 <property>
24                         <name>dfs.datanode.data.dir</name>
25                         <value>file:///home/hadoop/dfs/data</value>
26                 </property>
27                 <property>
28                         <name>dfs.replication</name>
29                         <value>2</value>
30                 </property>
31                 <property>
32                         <name>dfs.nameservices</name>
33                         <value>hadoop-cluster1</value>
34                 </property>
35                 <property>
36                         <name>dfs.namenode.secondary.http-address</name>
37                         <value>Master.Hadoop:50090</value>
38                 </property>
39                 <property>
40                         <name>dfs.webhdfs.enabled</name>
41                         <value>true</value>
42                 </property>
43         </configuration>

11.4 vi mapred-site.xml（若不存在则从template文件copy）

 1  <?xml version="1.0"?>
 2         <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 3         <!--
 4           Licensed under the Apache License, Version 2.0 (the "License");
 5           you may not use this file except in compliance with the License.
 6           You may obtain a copy of the License at
 7  
 8             http://www.apache.org/licenses/LICENSE-2.0
 9  
10           Unless required by applicable law or agreed to in writing, software
11           distributed under the License is distributed on an "AS IS" BASIS,
12           WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13           See the License for the specific language governing permissions and
14           limitations under the License. See accompanying LICENSE file.
15         -->
16  
17         <!-- Put site-specific property overrides in this file. -->
18  
19         <configuration>
20                 <property>
21                         <name>mapreduce.framework.name</name>
22                         <value>yarn</value>
23                         <final>true</final>
24                 </property>
25  
26                 <property>
27                         <name>mapreduce.jobtracker.http.address</name>
28                         <value>Master.Hadoop:50030</value>
29                 </property>
30                 <property>
31                         <name>mapreduce.jobhistory.address</name>
32                         <value>Master.Hadoop:10020</value>
33                 </property>
34                 <property>
35                         <name>mapreduce.jobhistory.webapp.address</name>
36                         <value>Master.Hadoop:19888</value>
37                 </property>
38                 <property>
39                         <name>mapred.job.tracker</name>
40                         <value>http://Master.Hadoop:9001</value>
41                 </property>
42         </configuration>

11.5 vi yarn-site.xml

 1 <?xml version="1.0"?>
 2         <!--
 3           Licensed under the Apache License, Version 2.0 (the "License");
 4           you may not use this file except in compliance with the License.
 5           You may obtain a copy of the License at
 6  
 7             http://www.apache.org/licenses/LICENSE-2.0
 8  
 9           Unless required by applicable law or agreed to in writing, software
10           distributed under the License is distributed on an "AS IS" BASIS,
11           WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12           See the License for the specific language governing permissions and
13           limitations under the License. See accompanying LICENSE file.
14         -->
15         <configuration>
16  
17                 <!-- Site specific YARN configuration properties -->
18                 <property>
19                         <name>yarn.resourcemanager.hostname</name>
20                         <value>Master.Hadoop</value>
21                 </property>
22                 <property>
23                         <name>yarn.nodemanager.aux-services</name>
24                         <value>mapreduce_shuffle</value>
25                 </property>
26                 <property>
27                         <name>yarn.resourcemanager.address</name>
28                         <value>Master.Hadoop:8032</value>
29                 </property>
30                 <property>
31                         <name>yarn.resourcemanager.scheduler.address</name>
32                         <value>Master.Hadoop:8030</value>
33                 </property>
34                 <property>
35                         <name>yarn.resourcemanager.resource-tracker.address</name>
36                         <value>Master.Hadoop:8031</value>
37                 </property>
38                 <property>
39                         <name>yarn.resourcemanager.admin.address</name>
40                         <value>Master.Hadoop:8033</value>
41                 </property>
42                 <property>
43                         <name>yarn.resourcemanager.webapp.address</name>
44                         <value>Master.Hadoop:8088</value>
45                 </property>
46         </configuration>

12. 在hadoop根目录下创建目录：mkdir tmp dfs dfs/name dfs/data（到此：单机的Server配置完成）

13. 单机验证

13.1 hadoop@Master:~/hadoop-2.6.4/bin$ hdfs namenode -format

13.2 hadoop@Master:~/hadoop-2.6.4/bin$ cd ../sbin/

13.3 hadoop@Master:~/hadoop-2.6.4/sbin$ start-dfs.sh

13.4 hadoop@Master:~/hadoop-2.6.4/sbin$ start-yarn.sh

13.5 hadoop@Master:~/hadoop-2.6.4/sbin$ mr-jobhistory-daemon.sh start historyserver

13.6 验证地址1：http://192.168.142.141:50070/

13.7 验证地址2：http://192.168.142.141:8088/

14. 执行：hadoop dfsadmin -report

hadoop@Master:~/hadoop-2.6.4/bin$ hadoop dfsadmin -report

DEPRECATED: Use of this script to execute hdfs command is deprecated.

Instead use the hdfs command for it.

Configured Capacity: 121819234304 (113.45 GB)

Present Capacity: 111630487552 (103.96 GB)

DFS Remaining: 111630434304 (103.96 GB)

DFS Used: 53248 (52 KB)

DFS Used%: 0.00%

Under replicated blocks: 0

Blocks with corrupt replicas: 0

Missing blocks: 0

-------------------------------------------------

Live datanodes (2):

Name: 192.168.142.143:50010 (Slave2.Hadoop)

Hostname: Slave2.Hadoop

Decommission Status : Normal

Configured Capacity: 60909617152 (56.73 GB)

DFS Used: 24576 (24 KB)

Non DFS Used: 5094354944 (4.74 GB)

DFS Remaining: 55815237632 (51.98 GB)

DFS Used%: 0.00%

DFS Remaining%: 91.64%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Tue Feb 23 17:18:59 CST 2016

Name: 192.168.142.142:50010 (Slave1.Hadoop)

Hostname: Slave1.Hadoop

Decommission Status : Normal

Configured Capacity: 60909617152 (56.73 GB)

DFS Used: 28672 (28 KB)

Non DFS Used: 5094391808 (4.74 GB)

DFS Remaining: 55815196672 (51.98 GB)

DFS Used%: 0.00%

DFS Remaining%: 91.64%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Tue Feb 23 17:18:59 CST 2016

15. 配置Slaves：

15.1 进入/home/hadoop/hadoop-2.6.4/etc/hadoop

15.2 添加Node配置

hadoop@Master:~/hadoop-2.6.4/etc/hadoop$ vi slaves

Slave1.Hadoop

Slave2.Hadoop

16. 分发Hadoop到Slave机器：

16.1 scp -r /home/hadoop/hadoop-2.6.4 192.168.142.142:/home/hadoop

16.2 scp -r /home/hadoop/hadoop-2.6.4 192.168.142.143:/home/hadoop

17. 最终验证

17.1 启动服务

hadoop@Master:~/hadoop-2.6.4/sbin$ ./stop-all.sh

This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh

Stopping namenodes on [Master.Hadoop]

Master.Hadoop: stopping namenode

Slave1.Hadoop: stopping datanode

Slave2.Hadoop: stopping datanode

stopping yarn daemons

stopping resourcemanager

Slave2.Hadoop: stopping nodemanager

Slave1.Hadoop: stopping nodemanager

no proxyserver to stop

hadoop@Master:~/hadoop-2.6.4/sbin$ ./start-all.sh

This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh

Starting namenodes on [Master.Hadoop]

Master.Hadoop: starting namenode, logging to /home/hadoop/hadoop-2.6.4/logs/hadoop-hadoop-namenode-Master.Hadoop.out

Slave2.Hadoop: starting datanode, logging to /home/hadoop/hadoop-2.6.4/logs/hadoop-hadoop-datanode-Slave2.Hadoop.out

Slave1.Hadoop: starting datanode, logging to /home/hadoop/hadoop-2.6.4/logs/hadoop-hadoop-datanode-Slave1.Hadoop.out

starting yarn daemons

starting resourcemanager, logging to /home/hadoop/hadoop-2.6.4/logs/yarn-hadoop-resourcemanager-Master.Hadoop.out

Slave1.Hadoop: starting nodemanager, logging to /home/hadoop/hadoop-2.6.4/logs/yarn-hadoop-nodemanager-Slave1.Hadoop.out

Slave2.Hadoop: starting nodemanager, logging to /home/hadoop/hadoop-2.6.4/logs/yarn-hadoop-nodemanager-Slave2.Hadoop.out

hadoop@Master:~/hadoop-2.6.4/sbin$

17.2 Master机器进程

hadoop@Master:~/hadoop-2.6.4/sbin$ jps

4870 NameNode

5112 ResourceManager

2897 JobHistoryServer

5384 Jps

17.3 Slave机器进程

hadoop@Slave1:~$ jps

2614 DataNode

2886 Jps

2758 NodeManager

18. 查看Nodes：http://192.168.142.141:8088/cluster/nodes

19. 【注】

19.1 【注1】：在15.2配置时，若不加入Master，则最终Nodes中是没有Master节点的；同时Master上也会少一个DataNode进程

19.2 【注2】：sudo无法执行时，在/etc/sudoers中加入：hadoop ALL=(ALL) ALL（hadoop为需要sudo的用户名）

20. Eclipse测试程序运行流程

20.1【数据准备】

　　　　hadoop@Master:~$ hadoop fs -mkdir /input

　　　　hadoop@Master:~$ hadoop fs -put ./hadoop-2.6.4/README.txt /input

　　　　hadoop@Master:~$ hadoop fs -chmod 777 /

20.2【代码】

 1 package com.ttfisher;
 2 
 3 import java.io.IOException;
 4 import java.util.StringTokenizer;
 5 import org.apache.hadoop.conf.Configuration;
 6 import org.apache.hadoop.fs.Path;
 7 import org.apache.hadoop.io.IntWritable;
 8 import org.apache.hadoop.io.Text;
 9 import org.apache.hadoop.mapreduce.Job;
10 import org.apache.hadoop.mapreduce.Mapper;
11 import org.apache.hadoop.mapreduce.Reducer;
12 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
13 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
14 import org.apache.hadoop.util.GenericOptionsParser;
15 
16 public class WordCount {
17 
18     public static class TokenizerMapper    extends Mapper<Object, Text, Text, IntWritable> {
19     
20         private final static IntWritable one = new IntWritable(1);
21         private Text word = new Text();
22         
23         public void map(Object key, Text value, Context context) 
24             throws IOException, InterruptedException {
25 
26             StringTokenizer itr = new StringTokenizer(value.toString());
27             while (itr.hasMoreTokens()) {
28                 word.set(itr.nextToken());
29                 context.write(word, one);
30             }
31         }
32     }
33 
34     public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
35 
36         private IntWritable result = new IntWritable();
37         
38         public void reduce(Text key, Iterable<IntWritable> values,Context context) 
39             throws IOException, InterruptedException {
40 
41             int sum = 0;
42             for (IntWritable val : values) {
43                 sum += val.get();
44             }
45             result.set(sum);
46             context.write(key, result);
47         }
48     }
49 
50     public static void main(String[] args) throws Exception {
51 
52         Configuration conf = new Configuration();
53         String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
54         if (otherArgs.length != 2) {
55             System.err.println("Usage: wordcount <in> <out>");
56             System.exit(2);
57         }
58 
59         Job job = new Job(conf, "word count");
60         job.setJarByClass(WordCount.class);
61         job.setMapperClass(TokenizerMapper.class);
62         job.setCombinerClass(IntSumReducer.class);
63         job.setReducerClass(IntSumReducer.class);
64         job.setOutputKeyClass(Text.class);
65         job.setOutputValueClass(IntWritable.class);
66         FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
67         FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
68         System.exit(job.waitForCompletion(true) ? 0 : 1);
69     }
70 }

20.3【运行参数】

　　　　hdfs://Master.Hadoop:9000/input/README.txt hdfs://Master.Hadoop:9000/output

20.4【VM argument 】

　　　　-Xmx512m

20.5【Run on hadoop】

001-Hadoop分布式集群安装

标签：

原文地址：http://www.cnblogs.com/bigshushu/p/5367959.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行