MapReduce编程(四) 求均值

时间：2017-03-31 14:35:58 阅读：233 评论：0 收藏：0 [点我收藏+]

标签：except 编码 put for word each while rabl addclass

一、问题描述

三个文件中分别存储了学生的语文、数学和英语成绩，输出每个学生的平均分。

数据格式如下：
Chinese.txt

张三    78
李四    89
王五    96
赵六    67

Math.txt

张三    88
李四    99
王五    66
赵六    77

English.txt

张三    80
李四    82
王五    84
赵六    86

二、MapReduce编程

package com.javacore.hadoop;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;


/**
 * Created by bee on 3/29/17.
 */
public class StudentAvgDouble {

    public static class MyMapper extends Mapper<Object, Text, Text, DoubleWritable> {

        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
           String eachline = value.toString();
           StringTokenizer tokenizer = new StringTokenizer(eachline, "\n");
            while (tokenizer.hasMoreElements()) {
                StringTokenizer tokenizerLine = new StringTokenizer(tokenizer
                        .nextToken());
                String strName = tokenizerLine.nextToken();
                String strScore = tokenizerLine.nextToken();
                Text name = new Text(strName);
                IntWritable score = new IntWritable(Integer.parseInt(strScore));
                context.write(name, score);
            }
        }
    }

    public static class MyReducer extends Reducer<Text, DoubleWritable, Text, DoubleWritable> {
        public void reduce(Text key, Iterable<DoubleWritable> values, Context
                context) throws IOException, InterruptedException {
            double sum = 0.0;
            int count = 0;
            for (DoubleWritable val : values) {
                sum += val.get();
                count++;
            }
            DoubleWritable avgScore = new DoubleWritable(sum / count);
            context.write(key, avgScore);
        }
    }

    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {

        //删除output文件夹
        FileUtil.deleteDir("output");
        Configuration conf = new Configuration();
        String[] otherArgs = new String[]{"input/studentAvg", "output"};
        if (otherArgs.length != 2) {
            System.out.println("参数错误");
            System.exit(2);
        }

        Job job = Job.getInstance();
        job.setJarByClass(StudentAvgDouble.class);
        job.setMapperClass(MyMapper.class);
        job.setReducerClass(MyReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(DoubleWritable.class);
        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);

    }
}

三、StringTokenizer和Split的用法对比

map函数里按行读入，每行按空格切开，之前我采用的split()函数切分，代码如下。

 String eachline = value.toString();
 for (String eachline : lines) {
                System.out.println("eachline:\t"+eachline);
                String[] words = eachline.split("\\s+");
                Text name = new Text(words[0]);
                IntWritable score = new IntWritable(Integer.parseInt(words[1]));
                context.write(name, score);
            }

这种方式简单明了，但是也存在缺陷，对于非正常编码的空格有时候会出现切割失败的情况。
StringTokenizer是java.util包中分割解析类，StringTokenizer类的构造函数有三个:

StringTokenizer（String str）：java默认的分隔符是“空格”、“制表符（‘\t’）”、“换行符(‘\n’）”、“回车符（‘\r’）。
StringTokenizer（String str,String delim）:可以构造一个用来解析str的StringTokenizer对象，并提供一个指定的分隔符。
StringTokenizer（String str,String delim,boolean returnDelims）：构造一个用来解析str的StringTokenizer对象，并提供一个指定的分隔符，同时，指定是否返回分隔符。

StringTokenizer和Split都可以对字符串进行切分，StringTokenizer的性能更高一些，分隔符如果用到一些特殊字符，StringTokenizer的处理结果更好。

四、运行结果

张三  82.0
李四  90.0
王五  82.0
赵六  76.66666666666667

MapReduce编程(四) 求均值

标签：except 编码 put for word each while rabl addclass

原文地址：http://blog.csdn.net/napoay/article/details/68923805

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行