hadoop编程小技巧（2）---计数器Counter

时间：2014-07-18 22:29:12 阅读：277 评论：0 收藏：0 [点我收藏+]

Hadoop代码测试版本：2.4

应用场景：在Hadoop编程的时候，有时我们在进行我们算法逻辑的时候想附带了解下数据的一些特性，比如全部数据的记录数有多少，map的输出有多少等等信息（这些是在算法运行完毕后，直接有的），就可以使用计数器Counter。

如果是针对很特定的数据的一些统计，比如统计以1开头的所有记录数等等信息，这时就需要自定义Counter。自定义Counter有两种方式，第一种，定义枚举类型，类似：

public enum MyCounters{
		ALL_RECORDS,ONE_WORD_lINE
	}

然后在需要计数的地方使用：

cxt.getCounter(MyCounters.ONE_WORD_lINE).increment(1); // one way to increase the records

第二种方法，使用字符串，在需要使用计数的时候直接使用：

cxt.getCounter("MyCounters_String", "One Word In a Line").increment(1);// the another way to increase the records

这样也可以对特定条件的数据进行计数。
实例：自定义Counter统计每行数据只含有一个单词的数据记录数；

使用《hadoop编程小技巧（1）---map端聚合》的代码，在其中添加下面的代码

public enum MyCounters{
		ALL_RECORDS,ONE_WORD_lINE
	}
	
	public static void main(String[] args) throws Exception {
		// TODO Auto-generated method stub
		ToolRunner.run(new Configuration(), new InMapArrgegationDriver(),args);
	}

protected void map(LongWritable key, Text value,Context cxt){
			String  [] line = value.toString().split(" "); // use blank to split
			if(line.length==1){
				cxt.getCounter(MyCounters.ONE_WORD_lINE).increment(1); // one way to increase the records
				cxt.getCounter("MyCounters_String", "One Word In a Line").increment(1);// the another way to increase the records
			}
			for(String word:line){
				Word curr = new Word(word,1);
				if(words.contains(curr)){
					// increase the exists word‘s frequency
					for(Word w:words){
						if(w.equals(curr)){
							w.frequency++;
							break;
						}
					}
				}else{
					words.add(curr);
				}
			}
		}

由于输入中有三行数据是只有一个单词的，所以上面统计的结果应该是3，查看输出信息：

bubuko.com,布布扣

这里可以看到自定义的枚举类型的Counter ，MyCounters.ONE_WORD_lINE 以及字符串定义的 One Word In a Line 统计得到的结果都是3；

总结：使用Counter可以在不影响程序原始逻辑的基础上，统计数据的信息，不过个人感觉用处不是很大。

分享，成长，快乐

转载请注明blog地址：http://blog.csdn.net/fansy1990

hadoop编程小技巧（2）---计数器Counter,布布扣,bubuko.com

hadoop编程小技巧（2）---计数器Counter

标签：des blog http java 使用 os

原文地址：http://blog.csdn.net/fansy1990/article/details/37882053

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行