码迷,mamicode.com
首页 > 其他好文 > 详细

【慕课网实战】Spark Streaming实时流处理项目实战笔记十六之铭文升级版

时间:2018-02-01 19:27:54      阅读:301      评论:0      收藏:0      [点我收藏+]

标签:9.1   too   处理   升级   基础   hbase   web   ==   项目实战   

铭文一级:

linux crontab
网站:http://tool.lu/crontab
每一分钟执行一次的crontab表达式: */1 * * * *

crontab -e
*/1 * * * * /home/hadoop/data/project/log_generator.sh


对接python日志产生器输出的日志到Flume
streaming_project.conf

选型:access.log ==> 控制台输出
exec
memory
logger


exec-memory-logger.sources = exec-source
exec-memory-logger.sinks = logger-sink
exec-memory-logger.channels = memory-channel

exec-memory-logger.sources.exec-source.type = exec
exec-memory-logger.sources.exec-source.command = tail -F /home/hadoop/data/project/logs/access.log
exec-memory-logger.sources.exec-source.shell = /bin/sh -c

exec-memory-logger.channels.memory-channel.type = memory

exec-memory-logger.sinks.logger-sink.type = logger

exec-memory-logger.sources.exec-source.channels = memory-channel
exec-memory-logger.sinks.logger-sink.channel = memory-channel

flume-ng agent \
--name exec-memory-logger \
--conf $FLUME_HOME/conf \
--conf-file /home/hadoop/data/project/streaming_project.conf \
-Dflume.root.logger=INFO,console


日志==>Flume==>Kafka
启动zk:./zkServer.sh start
启动Kafka Server:kafka-server-start.sh -daemon /home/hadoop/app/kafka_2.11-0.9.0.0/config/server.properties
修改Flume配置文件使得flume sink数据到Kafka

streaming_project2.conf
exec-memory-kafka.sources = exec-source
exec-memory-kafka.sinks = kafka-sink
exec-memory-kafka.channels = memory-channel

exec-memory-kafka.sources.exec-source.type = exec
exec-memory-kafka.sources.exec-source.command = tail -F /home/hadoop/data/project/logs/access.log
exec-memory-kafka.sources.exec-source.shell = /bin/sh -c

exec-memory-kafka.channels.memory-channel.type = memory

exec-memory-kafka.sinks.kafka-sink.type = org.apache.flume.sink.kafka.KafkaSink
exec-memory-kafka.sinks.kafka-sink.brokerList = hadoop000:9092
exec-memory-kafka.sinks.kafka-sink.topic = streamingtopic
exec-memory-kafka.sinks.kafka-sink.batchSize = 5
exec-memory-kafka.sinks.kafka-sink.requiredAcks = 1

exec-memory-kafka.sources.exec-source.channels = memory-channel
exec-memory-kafka.sinks.kafka-sink.channel = memory-channel

flume-ng agent \
--name exec-memory-kafka \
--conf $FLUME_HOME/conf \
--conf-file /home/hadoop/data/project/streaming_project2.conf \
-Dflume.root.logger=INFO,console

kafka-console-consumer.sh --zookeeper hadoop000:2181 --topic streamingtopic

数据清洗操作:从原始日志中取出我们所需要的字段信息就可以了

数据清洗结果类似如下:
ClickLog(46.30.10.167,20171022151701,128,200,-)
ClickLog(143.132.168.72,20171022151701,131,404,-)
ClickLog(10.55.168.87,20171022151701,131,500,-)
ClickLog(10.124.168.29,20171022151701,128,404,-)
ClickLog(98.30.87.143,20171022151701,131,404,-)
ClickLog(55.10.29.132,20171022151701,146,404,http://www.baidu.com/s?wd=Storm实战)
ClickLog(10.87.55.30,20171022151701,130,200,http://www.baidu.com/s?wd=Hadoop基础)
ClickLog(156.98.29.30,20171022151701,146,500,https://www.sogou.com/web?query=大数据面试)
ClickLog(10.72.87.124,20171022151801,146,500,-)
ClickLog(72.124.167.156,20171022151801,112,404,-)

到数据清洗完为止,日志中只包含了实战课程的日志


补充一点:希望你们的机器配置被太低
Hadoop/ZK/HBase/Spark Streaming/Flume/Kafka
hadoop000: 8Core 8G

 

铭文二级:

 

private String initDate() {  
       Date d = new Date();  
       FastDateFormat fdf = FastDateFormat.getInstance("yyyy-MM-dd HH:mm:ss");  
       return fdf.format(d);  
   }  

  

 

【慕课网实战】Spark Streaming实时流处理项目实战笔记十六之铭文升级版

标签:9.1   too   处理   升级   基础   hbase   web   ==   项目实战   

原文地址:https://www.cnblogs.com/kkxwz/p/8400548.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!