Windows下Eclipse连接hadoop

时间：2014-12-10 00:20:15 阅读：376 评论：0 收藏：0 [点我收藏+]

标签：des style blog http io ar color os 使用

hadoop在虚拟机上（远程连接也是一样只需要知道master的ip和core-site.xml配置即可。

Vmware上搭建了hadoop分布式平台：

192.168.11.134 master

192.168.11.135 slave1

192.168.11.136 slave2

core-site.xml 配置文件：

????????<name>fs.defaultFS</name>

????????<value>hdfs://master:9000</value>

????<description>The name of the default file system.</description>

</property>

????????<name>hadoop.tmp.dir</name>

????????

????????????<value>/usr/setup/hadoop/temp</value>

????????<description>A base for other temporary ????????directories.</description>

</property>

1 下载插件

????hadoop-eclipse-plugin-2.5.1.jar

github上下载源码后需要自己编译。这里使用已经编译好的插件即可

2 配置插件

把插件放到..\eclipse\plugins目录下，重启eclipse，配置Hadoop installation directory???，

????如果插件安装成功，打开Windows—Preferences后，在窗口左侧会有Hadoop Map/Reduce选项，点击此选项，在窗口右侧设置Hadoop安装路径。（windows下只需把hadoop-2.5.1.tar.gz解压到指定目录）

bubuko.com,布布扣

3 配置Map/Reduce Locations

?????打开Windows—Open Perspective—Other，选择Map/Reduce，点击OK，控制台会出现：

bubuko.com,布布扣

右键 new Hadoop location 配置hadoop：输入

Location Name，任意名称即可.

配置Map/Reduce Master和DFS Mastrer，Host和Port配置成与core-site.xml的设置一致即可。

bubuko.com,布布扣

点击"Finish"按钮，关闭窗口。

? 点击左侧的DFSLocations—>master （上一步配置的location name)，如能看到user，表示安装成功

bubuko.com,布布扣

4 wordcount实例

????? File—>Project，选择Map/Reduce Project，输入项目名称WordCount等。在WordCount项目里新建class，名称为WordCount，代码如下：

????

import java.io.IOException;

import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

????public static class TokenizerMapper extends Mapper<Object,Text,Text,IntWritable>{

????????private final static IntWritable one=new IntWritable(1);

????????private Text word =new Text();

????????public void map(Object key,Text value,Context context) throws IOException,InterruptedException{

????????????StringTokenizer itr=new StringTokenizer(value.toString());

????????????while (itr.hasMoreTokens()) {

????????????????word.set(itr.nextToken());

????????????????context.write(word, one);

????????????}

????????}

????}

????public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> {

????????private IntWritable result = new IntWritable();

????????public void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {

????????????int sum = 0;

????????????for (IntWritable val : values) {

????????????????sum += val.get();

????????????}

????????????result.set(sum);

????????????context.write(key, result);

????????}

????}

????public static void main(String[] args) throws Exception {

????????Configuration conf = new Configuration();

????????Job job = new Job(conf, "word count");

????????job.setJarByClass(WordCount.class);

????????job.setMapperClass(TokenizerMapper.class);

????????job.setCombinerClass(IntSumReducer.class);

????????job.setReducerClass(IntSumReducer.class);

????????job.setOutputKeyClass(Text.class);

????????job.setOutputValueClass(IntWritable.class);

????????FileInputFormat.addInputPath(job, new Path("hdfs://192.168.11.134:9000/in/test*.txt"));//路径1

????????FileOutputFormat.setOutputPath(job, new Path("hdfs://192.168.11.134:9000/output"));//输出路径

????????System.exit(job.waitForCompletion(true) ? 0 : 1);

????}

}

上面的路径1 和路径2 由于在代码中已经定义，这不需要在配置文件中定义，若上面路径1和路径2 代码为：

FileInputFormat.addInputPath(job, new Path(otherArgs[0]));

FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));

这需要配置运行路径：类右键 Run As—>Run Configurations

bubuko.com,布布扣

红色部分为配置的hdfs上文件路径，

点击run 或或者：Run on Hadoop，运行结果会显示在DFS Locations。若运行中有更新，右键DFS Locations，点disconnect更新

运行结果：

bubuko.com,布布扣

5 问题及解决办法

5.1 出现空指针异常：

1 在Hadoop的bin目录下放winutils.exe，

2 在环境变量中配置 HADOOP_HOME，

3 hadoop.dll拷贝到C:\Windows\System32下面即可

上面的文件已经下载，在文件hadoop-common-2.2.0-bin-master.zip中。

5.2 无法给hdfs上传文件

安装过程中由于已经在hdfs上上传了文件，当重启在 hdfs namenode –format时，后，会提示无法上传文件，此时需要删除hdfs已经存在的副本：

在master上删除dfs上name目录下的current目录： rm –rf current/

在slave上删除dfs上的整个data目录：rm –rf data/

5.3 出现log4j警告

????将文件log4j.properties放到src下和java文件同目录.

5.3 访问权限不够

参考博客：http://www.linuxidc.com/Linux/2014-08/105335.htm

方法1：这种方法无效

Eclipse连接远程Hadoop集群开发时权限不足问题解决方案：

当前登录windows的用户名和hadoop集群的用户名不一致，将没有权限访问

解决方案：

管理DFS system目录。目前做法是将hadoop服务集群关闭权限认证，修改hadoop安装集群master的hadoop-1.2.0/conf/hdfs-site.xml，增加：

<name>dfs.permissions</name>

<value>false</value>

</property>

正式发布时，可以在服务器创建一个和hadoop集群用户名一致的用户，即可不用修改master的permissions策略。

方法2：在master节点执行：（有效）

????hadoop fs -chmod 777 /user

其中/user是我上传文件的路径（这个视具体情况而定）

方法3 ：计算机用户名改为hadoop

Windows下Eclipse连接hadoop

标签：des style blog http io ar color os 使用

原文地址：http://www.cnblogs.com/baixl/p/4154429.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行

Windows下Eclipse连接hadoop

5.1 出现 空指针异常：

5.2 无法给hdfs上传文件

5.3 出现log4j警告

5.3 访问权限不够

5.1 出现空指针异常：