码迷,mamicode.com
首页 > 其他好文 > 详细

第一个MapReduce程序

时间:2016-06-17 23:41:23      阅读:268      评论:0      收藏:0      [点我收藏+]

标签:

计算文件中每个单词的频数

       wordcount 程序调用 wordmap 和 wordreduce 程序。

 1 import org.apache.hadoop.conf.Configuration;
 2 import org.apache.hadoop.fs.Path;
 3 import org.apache.hadoop.io.IntWritable;
 4 import org.apache.hadoop.io.Text;
 5 import org.apache.hadoop.mapreduce.Job;
 6 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
 7 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
 8 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
 9 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
10 
11 public class wordcount {
12 
13     /**
14      * @param args
15      */
16     public static void main(String[] args) throws Exception {
17         // TODO Auto-generated method stub
18         
19         Configuration conf = new Configuration();
20         Job job = new Job(conf,"wordcount");
21         job.setJarByClass(wordcount.class);
22         
23         job.setMapperClass(wordmap.class);
24         job.setReducerClass(wordreduce.class);
25         
26         job.setInputFormatClass(TextInputFormat.class);
27         job.setOutputFormatClass(TextOutputFormat.class);
28         
29         FileInputFormat.addInputPath(job,new Path(args[0]));
30         FileOutputFormat.setOutputPath(job, new Path(args[1]));
31         
32         job.setOutputKeyClass(Text.class);
33         job.setOutputValueClass(IntWritable.class);
34         
35         job.waitForCompletion(true);
36         
37 
38     }
39 
40 }

 

      wordmap 程序的输入为<key,value>(key是当前输入的行数,value对应的是行的内容),然后对此行的内容进行切词,每切下一个词就将其组织成<word,1>的形式,word表示文本内容,1代表出现了一次。

 1 import org.apache.hadoop.io.IntWritable;
 2 import org.apache.hadoop.io.LongWritable;
 3 import org.apache.hadoop.io.Text;
 4 import org.apache.hadoop.mapreduce.Mapper;
 5 
 6 public class wordmap extends Mapper<LongWritable, Text, Text, IntWritable> {
 7   
 8     private static final IntWritable one = new IntWritable(1);
 9     protected void map(
10             LongWritable key,
11             Text value,
12             org.apache.hadoop.mapreduce.Mapper<LongWritable, Text, Text, IntWritable>.Context context)
13             throws java.io.IOException, InterruptedException {
14         
15         String line = value.toString();
16         String[] words = line.split(" ");
17         for(String word : words){
18             context.write(new Text(word), one);
19             
20         }
21         
22     };
23 
24 }


      wordreduce 程序会接受到<word,{1,1,1,1……}>形式的数据,也就是特定单词及其出现的次数,其中 "1" 表示 word 出现的频数,所以每接收一个<word,{1,1,1,1……}>,就会在 word 的频数加 1 ,最后组织成<word,sum>的形式直接输出。

 1 import org.apache.hadoop.io.IntWritable;
 2 import org.apache.hadoop.io.Text;
 3 import org.apache.hadoop.mapreduce.Reducer;
 4 
 5 public class wordreduce extends Reducer<Text, IntWritable, Text, IntWritable> {
 6 
 7     protected void reduce(
 8             Text key,
 9             java.lang.Iterable<IntWritable> values,
10             org.apache.hadoop.mapreduce.Reducer<Text, IntWritable, Text, IntWritable>.Context context)
11             throws java.io.IOException, InterruptedException {
12         
13         int sum = 0;
14         for(IntWritable count : values){
15             sum+= count.get();
16             
17             
18         }
19         context.write(key, new IntWritable(sum));
20     };
21 
22 }

第一个MapReduce程序

标签:

原文地址:http://www.cnblogs.com/k-yang/p/5595334.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!