码迷,mamicode.com
首页 > 其他好文 > 详细

001-Hadoop分布式集群安装

时间:2016-04-08 14:59:26      阅读:244      评论:0      收藏:0      [点我收藏+]

标签:

1. 通过VMware Workstation安装一台Ubuntu操作系统的虚拟机(Master.Hadoop)
     技术分享技术分享
 
2. 配置虚拟机IP地址和Hostname
    2.1 配置IP:
        root@Master:~# cat /etc/network/interfaces
        # This file describes the network interfaces available on your system
        # and how to activate them. For more information, see interfaces(6).
 
        # The loopback network interface
        auto lo
        iface lo inet loopback
 
        # The primary network interface
        auto eth0
        iface eth0 inet static
        address 192.168.142.141
        netmask 255.255.255.0
        gateway 192.168.142.1
        root@Master:~#
    2.2 配置Hostname:
        root@Master:~# cat /etc/hostname
        Master.Hadoop
 
3. 创建一个hadoop用户组:
    3.1 root用户执行:groupadd hadoop
 
4. 新建一个hadoop用户:
    4.1 root用户执行:useradd -s /bin/bash -d /home/hadoop -m hadoop -g hadoop
 
5. 安装JDK并配置环境变量
    5.1 下载Linux版本的JDK包,上传到/opt/java下
    5.2 解压安装包,得到解压后目录:/opt/java/jdk1.7.0_79
    5.3 环境变量配置:(追加配置)
        vi /etc/profile
        ...
        export JAVA_HOME=/opt/java/jdk1.7.0_79
        export HADOOP_HOME=/home/hadoop/hadoop-2.6.4
        export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:
        export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:
 
6. 复制虚拟机,在VMware中打开并修改IP地址和Hostname,形成三台虚拟Ubuntu节点
    6.1 Master.Hadoop:192.168.142.141
    6.2 Slave1.Hadoop:192.168.142.142
    6.3 Slave2.Hadoop:192.168.142.143
    6.4 修改IP地址:vi /etc/network/interfaces
    6.5 修改Hostname:vi /etc/hostname
 
7. 配置三台虚拟机的hosts文件(将三台虚拟机Hostname映射加入)
    root@Master:~# cat /etc/hosts
    127.0.0.1       localhost
    127.0.1.1       ubuntu-cc.localdomain   ubuntu-cc
    192.168.142.141 Master.Hadoop
    192.168.142.142 Slave1.Hadoop
    192.168.142.143 Slave2.Hadoop
 
8. 实现ssh免密钥登录
    8.1 在Master的hadoop用户下:ssh-keygen -t dsa -P ‘‘ -f ~/.ssh/id_dsa
    8.2 在Master的hadoop用户下:cp id_dsa.pub authorized_keys
    8.3 在Slave1和Slave2分别执行:ssh-keygen -t dsa -P ‘‘ -f ~/.ssh/id_dsa
    8.4 将Master的id_dsa.pub追加到Slave1和Slave2的authorized_keys文件中
    8.5 测试免密钥登录:ssh Slave1.Hadoop
 
9. 下载最新Hadoop安装包:https://hadoop.apache.org/releases.html
 
10. 将下载后的hadoop-2.6.4.tar.gz上传到hadoop用户的根目录:/home/hadoop;并解压
 
11. 到/home/hadoop/hadoop-2.6.4/etc/hadoop目录下,开始配置Hadoop
    11.1 vi  core-site.xml
 1  <?xml version="1.0" encoding="UTF-8"?>
 2         <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 3         <!--
 4           Licensed under the Apache License, Version 2.0 (the "License");
 5           you may not use this file except in compliance with the License.
 6           You may obtain a copy of the License at
 7  
 8             http://www.apache.org/licenses/LICENSE-2.0
 9  
10           Unless required by applicable law or agreed to in writing, software
11           distributed under the License is distributed on an "AS IS" BASIS,
12           WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13           See the License for the specific language governing permissions and
14           limitations under the License. See accompanying LICENSE file.
15         -->
16  
17         <!-- Put site-specific property overrides in this file. -->
18  
19         <configuration>
20                 <property>
21                         <name>hadoop.tmp.dir</name>
22                         <value>/home/hadoop/tmp</value>
23                         <description>Abase for other temporary directories.</description>
24                 </property>
25                 <property>
26                         <name>fs.defaultFS</name>
27                         <value>hdfs://Master.Hadoop:9000</value>
28                 </property>
29                 <property>
30                         <name>io.file.buffer.size</name>
31                         <value>4096</value>
32                 </property>
33         </configuration>

 

    11.2 vi hadoop-env.sh和yarn-env.sh,在开头添加如下环境变量(必须配置绝对路径,否则启动时会报 JAVA_HOME is not set and could not be found.)
        export JAVA_HOME=/opt/java/jdk1.7.0_79
 
    11.3 vi hdfs-site.xml
 1 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 2         <!--
 3           Licensed under the Apache License, Version 2.0 (the "License");
 4           you may not use this file except in compliance with the License.
 5           You may obtain a copy of the License at
 6  
 7             http://www.apache.org/licenses/LICENSE-2.0
 8  
 9           Unless required by applicable law or agreed to in writing, software
10           distributed under the License is distributed on an "AS IS" BASIS,
11           WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12           See the License for the specific language governing permissions and
13           limitations under the License. See accompanying LICENSE file.
14         -->
15  
16         <!-- Put site-specific property overrides in this file. -->
17  
18         <configuration>
19                 <property>
20                         <name>dfs.namenode.name.dir</name>
21                         <value>file:///home/hadoop/dfs/name</value>
22                 </property>
23                 <property>
24                         <name>dfs.datanode.data.dir</name>
25                         <value>file:///home/hadoop/dfs/data</value>
26                 </property>
27                 <property>
28                         <name>dfs.replication</name>
29                         <value>2</value>
30                 </property>
31                 <property>
32                         <name>dfs.nameservices</name>
33                         <value>hadoop-cluster1</value>
34                 </property>
35                 <property>
36                         <name>dfs.namenode.secondary.http-address</name>
37                         <value>Master.Hadoop:50090</value>
38                 </property>
39                 <property>
40                         <name>dfs.webhdfs.enabled</name>
41                         <value>true</value>
42                 </property>
43         </configuration>

   

    11.4 vi mapred-site.xml(若不存在则从template文件copy)

 1  <?xml version="1.0"?>
 2         <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 3         <!--
 4           Licensed under the Apache License, Version 2.0 (the "License");
 5           you may not use this file except in compliance with the License.
 6           You may obtain a copy of the License at
 7  
 8             http://www.apache.org/licenses/LICENSE-2.0
 9  
10           Unless required by applicable law or agreed to in writing, software
11           distributed under the License is distributed on an "AS IS" BASIS,
12           WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13           See the License for the specific language governing permissions and
14           limitations under the License. See accompanying LICENSE file.
15         -->
16  
17         <!-- Put site-specific property overrides in this file. -->
18  
19         <configuration>
20                 <property>
21                         <name>mapreduce.framework.name</name>
22                         <value>yarn</value>
23                         <final>true</final>
24                 </property>
25  
26                 <property>
27                         <name>mapreduce.jobtracker.http.address</name>
28                         <value>Master.Hadoop:50030</value>
29                 </property>
30                 <property>
31                         <name>mapreduce.jobhistory.address</name>
32                         <value>Master.Hadoop:10020</value>
33                 </property>
34                 <property>
35                         <name>mapreduce.jobhistory.webapp.address</name>
36                         <value>Master.Hadoop:19888</value>
37                 </property>
38                 <property>
39                         <name>mapred.job.tracker</name>
40                         <value>http://Master.Hadoop:9001</value>
41                 </property>
42         </configuration>

 

    11.5 vi yarn-site.xml

 1 <?xml version="1.0"?>
 2         <!--
 3           Licensed under the Apache License, Version 2.0 (the "License");
 4           you may not use this file except in compliance with the License.
 5           You may obtain a copy of the License at
 6  
 7             http://www.apache.org/licenses/LICENSE-2.0
 8  
 9           Unless required by applicable law or agreed to in writing, software
10           distributed under the License is distributed on an "AS IS" BASIS,
11           WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12           See the License for the specific language governing permissions and
13           limitations under the License. See accompanying LICENSE file.
14         -->
15         <configuration>
16  
17                 <!-- Site specific YARN configuration properties -->
18                 <property>
19                         <name>yarn.resourcemanager.hostname</name>
20                         <value>Master.Hadoop</value>
21                 </property>
22                 <property>
23                         <name>yarn.nodemanager.aux-services</name>
24                         <value>mapreduce_shuffle</value>
25                 </property>
26                 <property>
27                         <name>yarn.resourcemanager.address</name>
28                         <value>Master.Hadoop:8032</value>
29                 </property>
30                 <property>
31                         <name>yarn.resourcemanager.scheduler.address</name>
32                         <value>Master.Hadoop:8030</value>
33                 </property>
34                 <property>
35                         <name>yarn.resourcemanager.resource-tracker.address</name>
36                         <value>Master.Hadoop:8031</value>
37                 </property>
38                 <property>
39                         <name>yarn.resourcemanager.admin.address</name>
40                         <value>Master.Hadoop:8033</value>
41                 </property>
42                 <property>
43                         <name>yarn.resourcemanager.webapp.address</name>
44                         <value>Master.Hadoop:8088</value>
45                 </property>
46         </configuration>

       

 
12. 在hadoop根目录下创建目录:mkdir tmp dfs dfs/name dfs/data(到此:单机的Server配置完成)
 
13. 单机验证
    13.1 hadoop@Master:~/hadoop-2.6.4/bin$ hdfs namenode -format
    13.2 hadoop@Master:~/hadoop-2.6.4/bin$ cd ../sbin/
    13.3 hadoop@Master:~/hadoop-2.6.4/sbin$ start-dfs.sh
    13.4 hadoop@Master:~/hadoop-2.6.4/sbin$ start-yarn.sh
    13.5 hadoop@Master:~/hadoop-2.6.4/sbin$ mr-jobhistory-daemon.sh  start historyserver
    13.6 验证地址1:http://192.168.142.141:50070/
    13.7 验证地址2:http://192.168.142.141:8088/
 
14. 执行:hadoop dfsadmin -report
    hadoop@Master:~/hadoop-2.6.4/bin$ hadoop dfsadmin -report
    DEPRECATED: Use of this script to execute hdfs command is deprecated.
    Instead use the hdfs command for it.
 
    Configured Capacity: 121819234304 (113.45 GB)
    Present Capacity: 111630487552 (103.96 GB)
    DFS Remaining: 111630434304 (103.96 GB)
    DFS Used: 53248 (52 KB)
    DFS Used%: 0.00%
    Under replicated blocks: 0
    Blocks with corrupt replicas: 0
    Missing blocks: 0
 
    -------------------------------------------------
    Live datanodes (2):
 
    Name: 192.168.142.143:50010 (Slave2.Hadoop)
    Hostname: Slave2.Hadoop
    Decommission Status : Normal
    Configured Capacity: 60909617152 (56.73 GB)
    DFS Used: 24576 (24 KB)
    Non DFS Used: 5094354944 (4.74 GB)
    DFS Remaining: 55815237632 (51.98 GB)
    DFS Used%: 0.00%
    DFS Remaining%: 91.64%
    Configured Cache Capacity: 0 (0 B)
    Cache Used: 0 (0 B)
    Cache Remaining: 0 (0 B)
    Cache Used%: 100.00%
    Cache Remaining%: 0.00%
    Xceivers: 1
    Last contact: Tue Feb 23 17:18:59 CST 2016
 
 
    Name: 192.168.142.142:50010 (Slave1.Hadoop)
    Hostname: Slave1.Hadoop
    Decommission Status : Normal
    Configured Capacity: 60909617152 (56.73 GB)
    DFS Used: 28672 (28 KB)
    Non DFS Used: 5094391808 (4.74 GB)
    DFS Remaining: 55815196672 (51.98 GB)
    DFS Used%: 0.00%
    DFS Remaining%: 91.64%
    Configured Cache Capacity: 0 (0 B)
    Cache Used: 0 (0 B)
    Cache Remaining: 0 (0 B)
    Cache Used%: 100.00%
    Cache Remaining%: 0.00%
    Xceivers: 1
    Last contact: Tue Feb 23 17:18:59 CST 2016
 
15. 配置Slaves:
    15.1 进入/home/hadoop/hadoop-2.6.4/etc/hadoop
    15.2 添加Node配置
        hadoop@Master:~/hadoop-2.6.4/etc/hadoop$ vi slaves
        Slave1.Hadoop
        Slave2.Hadoop
 
16. 分发Hadoop到Slave机器:
    16.1 scp -r /home/hadoop/hadoop-2.6.4 192.168.142.142:/home/hadoop
    16.2 scp -r /home/hadoop/hadoop-2.6.4 192.168.142.143:/home/hadoop
 
17. 最终验证
    17.1 启动服务
        hadoop@Master:~/hadoop-2.6.4/sbin$ ./stop-all.sh
        This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
        Stopping namenodes on [Master.Hadoop]
        Master.Hadoop: stopping namenode
        Slave1.Hadoop: stopping datanode
        Slave2.Hadoop: stopping datanode
        stopping yarn daemons
        stopping resourcemanager
 
        Slave2.Hadoop: stopping nodemanager
        Slave1.Hadoop: stopping nodemanager
        no proxyserver to stop
        hadoop@Master:~/hadoop-2.6.4/sbin$ ./start-all.sh
        This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
        Starting namenodes on [Master.Hadoop]
        Master.Hadoop: starting namenode, logging to /home/hadoop/hadoop-2.6.4/logs/hadoop-hadoop-namenode-Master.Hadoop.out
        Slave2.Hadoop: starting datanode, logging to /home/hadoop/hadoop-2.6.4/logs/hadoop-hadoop-datanode-Slave2.Hadoop.out
        Slave1.Hadoop: starting datanode, logging to /home/hadoop/hadoop-2.6.4/logs/hadoop-hadoop-datanode-Slave1.Hadoop.out
        starting yarn daemons
        starting resourcemanager, logging to /home/hadoop/hadoop-2.6.4/logs/yarn-hadoop-resourcemanager-Master.Hadoop.out
        Slave1.Hadoop: starting nodemanager, logging to /home/hadoop/hadoop-2.6.4/logs/yarn-hadoop-nodemanager-Slave1.Hadoop.out
        Slave2.Hadoop: starting nodemanager, logging to /home/hadoop/hadoop-2.6.4/logs/yarn-hadoop-nodemanager-Slave2.Hadoop.out
        hadoop@Master:~/hadoop-2.6.4/sbin$
    17.2 Master机器进程
        hadoop@Master:~/hadoop-2.6.4/sbin$ jps
        4870 NameNode
        5112 ResourceManager
        2897 JobHistoryServer
        5384 Jps
    17.3 Slave机器进程
        hadoop@Slave1:~$ jps
        2614 DataNode
        2886 Jps
        2758 NodeManager
 
18. 查看Nodes:http://192.168.142.141:8088/cluster/nodes
技术分享
   技术分享
 
19. 【注】
     19.1 【注1】:在15.2配置时,若不加入Master,则最终Nodes中是没有Master节点的;同时Master上也会少一个DataNode进程
     19.2 【注2】:sudo无法执行时,在/etc/sudoers中加入:hadoop   ALL=(ALL)       ALL(hadoop为需要sudo的用户名)
 
20. Eclipse测试程序运行流程
      20.1【数据准备】
    hadoop@Master:~$ hadoop fs -mkdir /input
    hadoop@Master:~$ hadoop fs -put ./hadoop-2.6.4/README.txt /input
    hadoop@Master:~$ hadoop fs -chmod 777 /
     20.2【代码】
 1 package com.ttfisher;
 2 
 3 import java.io.IOException;
 4 import java.util.StringTokenizer;
 5 import org.apache.hadoop.conf.Configuration;
 6 import org.apache.hadoop.fs.Path;
 7 import org.apache.hadoop.io.IntWritable;
 8 import org.apache.hadoop.io.Text;
 9 import org.apache.hadoop.mapreduce.Job;
10 import org.apache.hadoop.mapreduce.Mapper;
11 import org.apache.hadoop.mapreduce.Reducer;
12 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
13 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
14 import org.apache.hadoop.util.GenericOptionsParser;
15 
16 public class WordCount {
17 
18     public static class TokenizerMapper    extends Mapper<Object, Text, Text, IntWritable> {
19     
20         private final static IntWritable one = new IntWritable(1);
21         private Text word = new Text();
22         
23         public void map(Object key, Text value, Context context) 
24             throws IOException, InterruptedException {
25 
26             StringTokenizer itr = new StringTokenizer(value.toString());
27             while (itr.hasMoreTokens()) {
28                 word.set(itr.nextToken());
29                 context.write(word, one);
30             }
31         }
32     }
33 
34     public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
35 
36         private IntWritable result = new IntWritable();
37         
38         public void reduce(Text key, Iterable<IntWritable> values,Context context) 
39             throws IOException, InterruptedException {
40 
41             int sum = 0;
42             for (IntWritable val : values) {
43                 sum += val.get();
44             }
45             result.set(sum);
46             context.write(key, result);
47         }
48     }
49 
50     public static void main(String[] args) throws Exception {
51 
52         Configuration conf = new Configuration();
53         String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
54         if (otherArgs.length != 2) {
55             System.err.println("Usage: wordcount <in> <out>");
56             System.exit(2);
57         }
58 
59         Job job = new Job(conf, "word count");
60         job.setJarByClass(WordCount.class);
61         job.setMapperClass(TokenizerMapper.class);
62         job.setCombinerClass(IntSumReducer.class);
63         job.setReducerClass(IntSumReducer.class);
64         job.setOutputKeyClass(Text.class);
65         job.setOutputValueClass(IntWritable.class);
66         FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
67         FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
68         System.exit(job.waitForCompletion(true) ? 0 : 1);
69     }
70 }

     20.3【运行参数】

    hdfs://Master.Hadoop:9000/input/README.txt hdfs://Master.Hadoop:9000/output
     20.4【VM argument 】
    -Xmx512m 
     20.5【Run on hadoop】
    技术分享
技术分享

001-Hadoop分布式集群安装

标签:

原文地址:http://www.cnblogs.com/bigshushu/p/5367959.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!