标签:rdd zha alt star saveas conf 代码 exist cli
python代码:
import time
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
from operator import add
sc = SparkContext(master="local[1]",appName="PythonSparkStreamingRokidDtSnCount")
ssc = StreamingContext(sc, 2)
zkQuorum = ‘localhost:2181‘
topic = {‘rokid‘:1}
groupid = "test-consumer-group"
lines = KafkaUtils.createStream(ssc, zkQuorum, groupid, topic)
lines1 = lines.flatMap(lambda x: x.split("\n"))
valuestr = lines1.map(lambda x: x.value.decode())
valuedict = valuestr.map(lambda x:eval(x))
message = valuedict.map(lambda x: x["message"])
rdd2 = message.map(lambda x: (time.strftime("%Y-%m-%d",time.localtime(float(x.split("\u0001")[0].split("\u0002")[1])/1000))+"|"+x.split("\u0001")[1].split("\u0002")[1],1)).map(lambda x: (x[0],x[1]))
rdd3 = rdd2.reduceByKey(add)
rdd3.saveAsTextFiles("/tmp/wordcount")
rdd3.pprint()
ssc.start()
ssc.awaitTermination()
执行SparkStreaming:
spark/bin/spark-submit --jars spark-streaming-kafka-0-8-assembly_2.11-2.1.0.jar ReadFromKafkaStreaming.py
其中spark-streaming-kafka-0.98-assembly_2.11-2.1.0.jar从以下网站下载
http://search.maven.org
作为入门参考。
""" Counts words in UTF8 encoded, ‘\n‘ delimited text received from the network every second. Usage: kafka_wordcount.py <zk> <topic> To run this on your local machine, you need to setup Kafka and create a producer first, see http://kafka.apache.org/documentation.html#quickstart and then run the example `$ bin/spark-submit --jars external/kafka-assembly/target/scala-*/spark-streaming-kafka-assembly-*.jar examples/src/main/python/streaming/kafka_wordcount.py localhost:2181 test` """ from __future__ import print_function import sys from pyspark import SparkContext from pyspark.streaming import StreamingContext from pyspark.streaming.kafka import KafkaUtils if __name__ == "__main__": if len(sys.argv) != 3: print("Usage: kafka_wordcount.py <zk> <topic>", file=sys.stderr) exit(-1) sc = SparkContext(appName="PythonStreamingKafkaWordCount") ssc = StreamingContext(sc, 1) zkQuorum, topic = sys.argv[1:] kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer", {topic: 1}) lines = kvs.map(lambda x: x[1]) counts = lines.flatMap(lambda line: line.split(" ")) .map(lambda word: (word, 1)) .reduceByKey(lambda a, b: a+b) counts.pprint() ssc.start() ssc.awaitTermination()
bin/spark-submit --packages org.apache.spark:spark-streaming-kafka_2.10:1.6.2 examples/src/main/python/streaming/kafka_wordcount.py localhost:2181 test
kafka 使用:
This tutorial assumes you are starting fresh and have no existing Kafka or ZooKeeper data. Since Kafka console scripts are different for Unix-based and Windows platforms, on Windows platforms use bin\windows\
instead of bin/
, and change the script extension to .bat
.
1
2
|
> tar -xzf kafka_2.11-0.11.0.0.tgz > cd kafka_2.11-0.11.0.0 |
Kafka uses ZooKeeper so you need to first start a ZooKeeper server if you don‘t already have one. You can use the convenience script packaged with kafka to get a quick-and-dirty single-node ZooKeeper instance.
1
2
3
|
> bin /zookeeper-server-start .sh config /zookeeper .properties [2013-04-22 15:01:37,495] INFO Reading configuration from: config /zookeeper .properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig) ... |
Now start the Kafka server:
1
2
3
4
|
> bin /kafka-server-start .sh config /server .properties [2013-04-22 15:01:47,028] INFO Verifying properties (kafka.utils.VerifiableProperties) [2013-04-22 15:01:47,051] INFO Property socket.send.buffer.bytes is overridden to 1048576 (kafka.utils.VerifiableProperties) ... |
Let‘s create a topic named "test" with a single partition and only one replica:
1
|
> bin /kafka-topics .sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test |
We can now see that topic if we run the list topic command:
1
2
|
> bin /kafka-topics .sh --list --zookeeper localhost:2181 test |
Alternatively, instead of manually creating topics you can also configure your brokers to auto-create topics when a non-existent topic is published to.
Kafka comes with a command line client that will take input from a file or from standard input and send it out as messages to the Kafka cluster. By default, each line will be sent as a separate message.
Run the producer and then type a few messages into the console to send to the server.
1
2
3
|
> bin /kafka-console-producer .sh --broker-list localhost:9092 --topic test This is a message This is another message |
Kafka also has a command line consumer that will dump out messages to standard output.
1
2
3
|
> bin /kafka-console-consumer .sh --bootstrap-server localhost:9092 --topic test --from-beginning This is a message This is another message |
kafka spark streaming例子——TODO 没有成功
标签:rdd zha alt star saveas conf 代码 exist cli
原文地址:http://www.cnblogs.com/bonelee/p/7435506.html