标签:foreach examples spl contex log mapreduce head style 基础
对文件进行词频统计,是一个大数据领域的hello word级别的应用,来看下实现有多简单:
egrep -o "\b[[:alpha:]]+\b" test_word.log|sort|uniq -c|sort -rn|head -10
val sparkConf = new SparkConf() val sc = new SparkContext(sparkConf) sc.textFile("test_word.log").flatMap(_.split("\\s+")).map((_, 1)).reduceByKey(_ + _).sortBy(_._2, false).take(10).foreach(println)
hadoop jar /path/hadoop-2.6.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.1.jar wordcount /tmp/wordcount/input /tmp/wordcount/output
附:测试文件test_word.log内容如下:
hello world
hello www
输出如下:
2 hello
1 world
1 www
标签:foreach examples spl contex log mapreduce head style 基础
原文地址:https://www.cnblogs.com/barneywill/p/10115301.html