码迷,mamicode.com
首页 > Web开发 > 详细

实时事件统计项目:优化flume:用file channel代替mem channel

时间:2016-12-28 19:40:54      阅读:207      评论:0      收藏:0      [点我收藏+]

标签:duration   href   ref   .com   inline   lines   hsi   auto   分享   

背景:利用kafka+flume+morphline+solr做实时统计

solr从12月23号开始一直没有数据。查看日志发现,因为有一个同事加了一条格式错误的埋点数据,导致大量error。

据推断,是因为使用mem channel占满,消息来不及处理,导致新来的数据都丢失了。

技术分享

修改flume使用file channel:

kafka2solr.sources = source_from_kafka
kafka2solr.channels = file_channel
kafka2solr.sinks = solrSink

# For each one of the sources, the type is defined  
kafka2solr.sources.source_from_kafka.type = org.apache.flume.source.kafka.KafkaSource
kafka2solr.sources.source_from_kafka.channels = file_channel
kafka2solr.sources.source_from_kafka.batchSize = 100
kafka2solr.sources.source_from_kafka.useFlumeEventFormat=false
kafka2solr.sources.source_from_kafka.kafka.bootstrap.servers= kafkanode0:9092,kafkanode1:9092,kafkanode2:9092
kafka2solr.sources.source_from_kafka.kafka.topics = eventCount
kafka2solr.sources.source_from_kafka.kafka.consumer.group.id = flume_solr_caller
kafka2solr.sources.source_from_kafka.kafka.consumer.auto.offset.reset=latest

# file channel  
kafka2solr.channels.file_channel.type = file
kafka2solr.channels.file_channel.checkpointDir = /var/log/flume-ng/checkpoint
kafka2solr.channels.file_channel.dataDirs = /var/log/flume-ng/data


kafka2solr.sinks.solrSink.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink
kafka2solr.sinks.solrSink.channel = file_channel
#kafka2solr.sinks.solrSink.batchSize = 1000
#kafka2solr.sinks.solrSink.batchDurationMillis = 1000
kafka2solr.sinks.solrSink.morphlineFile = morphlines.conf
kafka2solr.sinks.solrSink.morphlineId=morphline1
kafka2solr.sinks.solrSink.isIgnoringRecoverableExceptions=true

使得数据持久化到磁盘不会丢失。

实时事件统计项目:优化flume:用file channel代替mem channel

标签:duration   href   ref   .com   inline   lines   hsi   auto   分享   

原文地址:http://www.cnblogs.com/arli/p/6230310.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!