标签:
Hadoop集群搭建好之后,解压Spark文件即可
Spark安装包
http://yunpan.cn/csPh8cf2n5WrT 提取码 1085
Spark命令-统计README.md文件行数,以及寻找所含关键字,以及文件第一行的相关命令
val lines = sc.textFile("README.md")
?
lines.count()
?
lines.first()
?
val pythonLines = lines.filter(line => line.contains("Python"))
?
scala> lines.first()
res0: String = ## Interactive Python Shel
Spark命令-对数组进行加法
1. 运行./spark-shell.sh
?
2. scala> val data = Array(1, 2, 3, 4, 5) //产生data
?
data: Array[Int] = Array(1, 2, 3, 4, 5)
?
3. scala> val distData = sc.parallelize(data) //将data处理成RDD
?
distData: spark.RDD[Int] = spark.ParallelCollection@7a0ec850 (显示出的类型为RDD)
?
4. scala> distData.reduce(_+_) //在RDD上进行运算,对data里面元素进行加和
?
12/05/10 09:36:20 INFO spark.SparkContext: Starting job...
?
5. 最后运行得到
?
12/05/10 09:36:20 INFO spark.SparkContext: Job finished in 0.076729174 s
?
res2: Int = 15
Spark命令-wordcount
val lines = sc.textFile("README.md")
val count=lines.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)
count.collect()
Spark命令-运行计算pi的程序
/run-example org.apache.spark.examples.SparkPi
标签:
原文地址:http://www.cnblogs.com/keedor/p/4423996.html