第一个hadoop程序（hadoop2.4.0集群+Eclipse环境）

时间：2015-04-07 13:29:08 阅读：424 评论：0 收藏：0 [点我收藏+]

标签：

一、Eclipse hadoop环境配置

1. 在我的电脑右键->属性->高级系统设置->环境变量，配置环境变量：

JAVA_HOME=D:\ProgramFiles\Java\jdk1.7.0_67，

HADOOP_HOME=D:\TEDP_Software\hadoop-2.4.0,

PATH=.;%JAVA_HOME%\bin;%HADOOP_HOME%\bin;

2. 在Eclipse中安装好hadoop-eclipse-kepler-plugin-2.2.0.jar插件，并配置好Hadoop Server

二、WordCount程序

1.准备测试文件

[hadoop@master hadoop]# mkdir file

[hadoop@master hadoop]# cd file

[hadoop@master file]# ls
[hadoop@master file]# echo "Hello world">file1.txt
[hadoop@master file]# echo"Hello hadoop">file2.txt

2. 输入文件夹

创建Hadoop文件夹： hadoop fs -mkdir /user

权限设置：hadoop fs -chmod -R 777 /user

创建输入文件夹： hadoop fs -mkdir /user/input

查看文件夹： hadoop fs -ls /

上传文件到Hadoop： hadoop fs -put ~/file/file*.txt /user/input

报错1：

java.net.NoRouteToHostException: No route to host

(或在hive中：could only be replicated to 0 nodes instead of minReplication (=1). There are 2 datanode(s) running and 2 node(s) are excluded in this operation.)

防火墙没关闭导致的：各主机切换到root, 执行 service iptables stop

3. 新建MR工程，将附件中WordCount.java拷贝进去

在WordCount类上右键->Run as->Run Configurations,输入如下参数信息：

hdfs://192.168.1.200:9000/user/input hdfs://192.168.1.200:9000/user/output

4.Run on hadoop

（1）异常信息1：Exception in thread "main" java.lang.NullPointerException

解决办法: 百度上说,这是Hadoop在windows上的一个BUG，在linux上没有问题

下载hadoop-common-2.2.0-bin-master.zip，技术分享

解压后将

bin中的文件替换到.\hadoop-2.4.0\bin，

并将bin中的hadoop.dll拷贝到C:\Windows\System32中，重启电脑。

（2）异常信息2：14/12/02 21:01:01 ERROR util.Shell: Failed to locate the winutils binary in the hadoop binary path

java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.

解决办法: 配置本地环境变量： HADOOP_HOME =D:\Soft\Linux\hadoop-2.4.0需重启，

不想重启的话在代码中加： System.setProperty("hadoop.home.dir", "D:\\Soft\\Linux\\hadoop-2.4.0");

（3）异常信息3：Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://192.168.1.200:9000/user/output already exists

解决办法: output文件夹已存在，修改一下输出文件夹或间output删掉

（4）异常信息4：[97;97;98;99;13p[0m 然后没反应了(这是后来新建第二个hadoop程序时发生的错误)

技术分享

解决办法：到Run Configurations->main中发现mainclass为jline.ANSIBuffer，改成WordCount，让后点击“Run”即可

注意：如果用”Run As“ ->“Run On Hadoop”菜单执行，在弹出页面选择Select Type的时候要输入或选择WordCount;

技术分享

5.OK 运行结果：

技术分享 Hello 2

hadoop 1

world 1

6. 附件： WordCount .java文件

import java.io.IOException;

import java.util.*;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.conf.*;

import org.apache.hadoop.io.*;

import org.apache.hadoop.mapred.*;

import org.apache.hadoop.util.*;

public class WordCount {

public static class Map extends MapReduceBase implements

Mapper<LongWritable, Text, Text, IntWritable> {

private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

public void map(LongWritable key, Text value,

OutputCollector<Text, IntWritable> output, Reporter reporter)

throws IOException {

String line = value.toString();

StringTokenizer tokenizer = new StringTokenizer(line);

while (tokenizer.hasMoreTokens()) {

word.set(tokenizer.nextToken());

output.collect(word, one);

}

public static class Reduce extends MapReduceBase implements

Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterator<IntWritable> values,

OutputCollector<Text, IntWritable> output, Reporter reporter)

throws IOException {

int sum = 0;

while (values.hasNext()) {

sum += values.next().get();

}

output.collect(key, new IntWritable(sum));

}

public static void main(String[] args) throws Exception {

// System.setProperty("hadoop.home.dir", "D:\\Soft\\Linux\\hadoop-2.4.0");

JobConf conf = new JobConf(WordCount.class);

conf.setJobName("wordcount");

conf.setOutputKeyClass(Text.class);

conf.setOutputValueClass(IntWritable.class);

conf.setMapperClass(Map.class);

conf.setCombinerClass(Reduce.class);

conf.setReducerClass(Reduce.class);

conf.setInputFormat(TextInputFormat.class);

conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf, new Path(args[0]));

FileOutputFormat.setOutputPath(conf, new Path(args[1]));

JobClient.runJob(conf);

}

本文参考：http://www.cnblogs.com/xia520pi/archive/2012/05/16/2504205.html

《完》

技术分享

第一个hadoop程序（hadoop2.4.0集群+Eclipse环境）

标签：

原文地址：http://www.cnblogs.com/zhaohz/p/4397953.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行