sudo yum install spark-core spark-master spark-worker spark-pythonhost2 作为 history-server 和 worker
sudo yum install spark-core spark-worker spark-history-server spark-python
### ### === IMPORTANT === ### Change the following to specify a real cluster‘s Master host ### export STANDALONE_SPARK_MASTER_HOST=‘host1‘注意: 包裹host1的符号也要换成单引号
$ sudo -u hdfs hadoop fs -mkdir /user/spark $ sudo -u hdfs hadoop fs -mkdir /user/spark/applicationHistory $ sudo -u hdfs hadoop fs -chown -R spark:spark /user/spark $ sudo -u hdfs hadoop fs -chmod 1777 /user/spark/applicationHistory在Spark客户端,在本例中就是host2,创建一份新的配置文件
cp /etc/spark/conf/spark-defaults.conf.template /etc/spark/conf/spark-defaults.conf把下面这两行增加到/etc/spark/conf/spark-defaults.conf 里面去
spark.eventLog.dir=/user/spark/applicationHistory spark.eventLog.enabled=true在所有的机器上复制hdfs-site.xml到 /etc/spark/conf 下
cp /etc/hadoop/conf/hdfs-site.xml /etc/spark/conf/
sudo service spark-master start
sudo service spark-worker start
sudo service spark-history-server start
[root@host1 impala]# spark-shell 2015-02-10 09:02:07,059 INFO [main] spark.SecurityManager (Logging.scala:logInfo(59)) - Changing view acls to: root 2015-02-10 09:02:07,069 INFO [main] spark.SecurityManager (Logging.scala:logInfo(59)) - Changing modify acls to: root 2015-02-10 09:02:07,070 INFO [main] spark.SecurityManager (Logging.scala:logInfo(59)) - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 2015-02-10 09:02:07,072 INFO [main] spark.HttpServer (Logging.scala:logInfo(59)) - Starting HTTP Server 2015-02-10 09:02:07,217 INFO [main] server.Server (Server.java:doStart(272)) - jetty-8.y.z-SNAPSHOT 2015-02-10 09:02:07,350 INFO [main] server.AbstractConnector (AbstractConnector.java:doStart(338)) - Started SocketConnector@0.0.0.0:59058 2015-02-10 09:02:07,352 INFO [main] util.Utils (Logging.scala:logInfo(59)) - Successfully started service ‘HTTP class server‘ on port 59058. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ ‘_/ /___/ .__/\_,_/_/ /_/\_\ version 1.2.0 /_/ Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_25) ... 2015-02-10 09:02:21,572 INFO [main] storage.BlockManagerMaster (Logging.scala:logInfo(59)) - Registered BlockManager 2015-02-10 09:02:22,472 INFO [main] scheduler.EventLoggingListener (Logging.scala:logInfo(59)) - Logging events to file:/user/spark/applicationHistory/local-1423530140986 2015-02-10 09:02:22,672 INFO [main] repl.SparkILoop (Logging.scala:logInfo(59)) - Created spark context.. Spark context available as sc. scala>我们来开始玩一下Spark。还是做之前用YARN做的wordcount任务,看看Spark如何完成这项任务。
$ echo "Hello World Bye World" > file0 $ echo "Hello Hadoop Goodbye Hadoop" > file1 $ hdfs dfs -mkdir -p /user/spark/wordcount/input $ hdfs dfs -put file* /user/spark/wordcount/input
val file = sc.textFile("hdfs://mycluster/user/spark/wordcount/input") val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _) counts.saveAsTextFile("hdfs://mycluster/user/spark/wordcount/output")这回不用写java代码了,简单了好多。这里用的是Scala语言。
grunt> ls hdfs://mycluster/user/spark/wordcount/input <dir> hdfs://mycluster/user/spark/wordcount/output <dir> grunt> cd output grunt> ls hdfs://mycluster/user/spark/wordcount/output/_SUCCESS<r 2> 0 hdfs://mycluster/user/spark/wordcount/output/part-00000<r 2> 8 hdfs://mycluster/user/spark/wordcount/output/part-00001<r 2> 10 hdfs://mycluster/user/spark/wordcount/output/part-00002<r 2> 33 grunt> cat part-00000 (Bye,1) grunt> cat part-00001 (World,2) grunt> cat part-00002 (Goodbye,1) (Hello,2) (Hadoop,2)
Alex 的 Hadoop 菜鸟教程: 第17课 更快速的MapReduce - Spark
原文地址:http://blog.csdn.net/nsrainbow/article/details/43735737