首页 > 其他好文 > 详细

在CentOS 运行你的第一个MapReduce程序

时间:2014-11-26 14:18:14      阅读:535      评论:0      收藏:0      [点我收藏+]

标签:centos   hadoop   maxtemperature   failed fetch notific   maven   

在进行本文的操作之前要先搭建一个Hadoop的环境,为了便于实验,可采用单节点部署的方式,具体方法可参见:Centos 6.5 下Hadoop 1.2.1单节点环境的创建

  • 编写源码

  • Mapper

    package com.eric.hadoop.map;
    import java.io.IOException;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapred.MapReduceBase;
    import org.apache.hadoop.mapred.Mapper;
    import org.apache.hadoop.mapred.OutputCollector;
    import org.apache.hadoop.mapred.Reporter;
    public class MaxTemperatureMapper extends MapReduceBase implements
        Mapper<LongWritable, Text, Text, IntWritable> {
      private static final int MISSING = 9999;
      public void map(LongWritable fileOffset, Text lineRecord,
          OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
        System.out.println("##Processing Record:" + lineRecord.toString());
        String line = lineRecord.toString();
        String year = line.substring(15, 19);
        int temperature;
        if (line.charAt(87) == '+') {
          temperature = Integer.parseInt(line.substring(88, 92));
        } else {
          temperature = Integer.parseInt(line.substring(87, 92));
        String quality = line.substring(92, 93);
        if (temperature != MISSING && quality.matches("[01459]")) {
          output.collect(new Text(year), new IntWritable(temperature));

  • Reduce
    package com.eric.hadoop.reduce;
    import java.io.IOException;
    import java.util.Iterator;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapred.MapReduceBase;
    import org.apache.hadoop.mapred.OutputCollector;
    import org.apache.hadoop.mapred.Reducer;
    import org.apache.hadoop.mapred.Reporter;
    public class MaxTemperatureReduce extends MapReduceBase implements
        Reducer<Text, IntWritable, Text, IntWritable> {
      public void reduce(Text year, Iterator<IntWritable> temperatures,
          OutputCollector<Text, IntWritable> output, Reporter arg3) throws IOException {
        int maxTemperature = Integer.MIN_VALUE;
        System.out.println("##Processing temperatures:" + temperatures);
        while (temperatures.hasNext()) {
          maxTemperature = Math.max(maxTemperature, temperatures.next().get());
        output.collect(year, new IntWritable(maxTemperature));

  • Main
    package com.eric.hadoop.jobconfig;
    import java.io.IOException;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapred.FileInputFormat;
    import org.apache.hadoop.mapred.FileOutputFormat;
    import org.apache.hadoop.mapred.JobClient;
    import org.apache.hadoop.mapred.JobConf;
    import com.eric.hadoop.map.MaxTemperatureMapper;
    import com.eric.hadoop.reduce.MaxTemperatureReduce;
    public class MaxTemperature {
      public static void main(String[] args) throws IOException {
        JobConf conf = new JobConf(MaxTemperature.class);
        conf.setJobName("Get Max Temperature!");
        if (args.length != 2) {
          System.err.print("Must contain 2 params:inputPath OutputPath");
        FileInputFormat.addInputPaths(conf, args[0]);
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));

  • 生成Jar文件

    mvn install
  • 获取测试数据

    hadoop dfs -mkdir testdata
    hadoop dfs -mkdir output
    hadoop dfs -put 1902 testdata
  • 执行Job

    hadoop jar hadoop-0.0.1-SNAPSHOT.jar testdata/1902 output
  • 观察结果

    [hadoop@localhost ~]$ hadoop jar hadoop-0.0.1-SNAPSHOT.jar testdata/1902 output 
    Warning: $HADOOP_HOME is deprecated. 

    14/11/26 13:33:39 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 
    14/11/26 13:33:39 INFO util.NativeCodeLoader: Loaded the native-hadoop library 
    14/11/26 13:33:39 WARN snappy.LoadSnappy: Snappy native library not loaded 
    14/11/26 13:33:39 INFO mapred.FileInputFormat: Total input paths to process : 1 
    14/11/26 13:33:40 INFO mapred.JobClient: Running job: job_201411261331_0002 #job的标识
    14/11/26 13:33:41 INFO mapred.JobClient: map 0% reduce 0% 
    14/11/26 13:33:47 INFO mapred.JobClient: map 100% reduce 0% #Mapper的进度
    14/11/26 13:33:54 INFO mapred.JobClient: map 100% reduce 33% 
    14/11/26 13:33:56 INFO mapred.JobClient: map 100% reduce 100%#Reduce的进度
    14/11/26 13:33:57 INFO mapred.JobClient: Job complete: job_201411261331_0002 
    14/11/26 13:33:57 INFO mapred.JobClient: Counters: 30 
    14/11/26 13:33:57 INFO mapred.JobClient: Job Counters 
    14/11/26 13:33:57 INFO mapred.JobClient: Launched reduce tasks=1 
    14/11/26 13:33:57 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=7744 
    14/11/26 13:33:57 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 
    14/11/26 13:33:57 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 
    14/11/26 13:33:57 INFO mapred.JobClient: Launched map tasks=2 
    14/11/26 13:33:57 INFO mapred.JobClient: Data-local map tasks=2 
    14/11/26 13:33:57 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=9008 
    14/11/26 13:33:57 INFO mapred.JobClient: File Input Format Counters 
    14/11/26 13:33:57 INFO mapred.JobClient: Bytes Read=890953 
    14/11/26 13:33:57 INFO mapred.JobClient: File Output Format Counters 
    14/11/26 13:33:57 INFO mapred.JobClient: Bytes Written=9 
    14/11/26 13:33:57 INFO mapred.JobClient: FileSystemCounters 
    14/11/26 13:33:57 INFO mapred.JobClient: FILE_BYTES_READ=72221 
    14/11/26 13:33:57 INFO mapred.JobClient: HDFS_BYTES_READ=891143 
    14/11/26 13:33:57 INFO mapred.JobClient: FILE_BYTES_WRITTEN=309368 
    14/11/26 13:33:57 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=9 
    14/11/26 13:33:57 INFO mapred.JobClient: Map-Reduce Framework 
    14/11/26 13:33:57 INFO mapred.JobClient: Map output materialized bytes=72227 
    14/11/26 13:33:57 INFO mapred.JobClient: Map input records=6565 #Mapper的输入记录数
    14/11/26 13:33:57 INFO mapred.JobClient: Reduce shuffle bytes=72227 
    14/11/26 13:33:57 INFO mapred.JobClient: Spilled Records=13130 
    14/11/26 13:33:57 INFO mapred.JobClient: Map output bytes=59085 
    14/11/26 13:33:57 INFO mapred.JobClient: Total committed heap usage (bytes)=478543872 
    14/11/26 13:33:57 INFO mapred.JobClient: CPU time spent (ms)=4400 #CPU耗时
    14/11/26 13:33:57 INFO mapred.JobClient: Map input bytes=888978 
    14/11/26 13:33:57 INFO mapred.JobClient: SPLIT_RAW_BYTES=190 
    14/11/26 13:33:57 INFO mapred.JobClient: Combine input records=0 
    14/11/26 13:33:57 INFO mapred.JobClient: Reduce input records=6565 #Reduce的输出记录数
    14/11/26 13:33:57 INFO mapred.JobClient: Reduce input groups=1 
    14/11/26 13:33:57 INFO mapred.JobClient: Combine output records=0 
    14/11/26 13:33:57 INFO mapred.JobClient: Physical memory (bytes) snapshot=501690368 
    14/11/26 13:33:57 INFO mapred.JobClient: Reduce output records=1 #Reduce的输出记录数
    14/11/26 13:33:57 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2167922688 
    14/11/26 13:33:57 INFO mapred.JobClient: Map output records=6565#Mapper的输出记录数
  • 检查运行结果

  • 故障以及解析

    问题描述:hadoop 的map阶段正常,但是reduce却卡在00%那里,等了好久进度仍然不变
    日志报错:2011-10-03 09:46:13,349 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #1 for task attempt_201110022127_0003_m_000000_0

    1. 将/etc/hosts中的主机名与/etc/sysconfig/network中的HOSTNAME一致,修改对应的文件后重启系统

在CentOS 运行你的第一个MapReduce程序

标签:centos   hadoop   maxtemperature   failed fetch notific   maven   


评论 一句话评论(0
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com