码迷,mamicode.com
首页 > Web开发 > 详细

flume的使用

时间:2016-09-11 00:02:52      阅读:341      评论:0      收藏:0      [点我收藏+]

标签:

1.flume的安装和配置

  1.1 配置java_home,修改/opt/cdh/flume-1.5.0-cdh5.3.6/conf/flume-env.sh文件

    技术分享

  1.2 配置hdfs集成

    1.2.1添加hdfs的jar包到/opt/cdh/flume-1.5.0-cdh5.3.6/lib目录下

      commons-configuration-1.6.jar

      hadoop-common-2.5.0-cdh5.3.6.jar

      hadoop-hdfs-2.5.0-cdh5.3.6.jar

      hadoop-auth-2.5.0-cdh5.3.6.jar

  1.3 查看flume版本 bin/flume-ng version

    技术分享

2.编写第一个agent案例,source使用 netCat Source,channel使用 memory channel,sink使用 logger sink

  2.1 编写/opt/cdh/flume-1.5.0-cdh5.3.6/conf/a1-conf.properties

    

# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per agent, 
# in this case called ‘agent‘

#定义agent的三要素:source、channel、sink
a1.sources = s1
a1.channels = c1
a1.sinks = k1

#定义source
a1.sources.s1.type=netcat
a1.sources.s1.bind=life-hadoop.life.com
a1.sources.s1.port=44444

#定义channel
a1.channels.c1.type=memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

#定义sink
a1.sinks.k1.type=logger


#定义三者之间的关系
a1.sources.s1.channels=c1
a1.sinks.k1.channel=c1

  2.2 安装telnet

    sudo rpm -ivh xinetd-2.3.14-38.el6.x86_64.rpm telnet-0.17-47.el6_3.1.x86_64.rpm telnet-server-0.17-47.el6_3.1.x86_64.rpm

    sudo /etc/rc.d/init.d/xinetd restart

    技术分享

  2.3 启动agent

    bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/a1-conf.properties -Dflume.root.logger=DEBUG,console

    技术分享

  2.4 连接telnet并进行测试

    telnet life-hadoop.life.com 44444

    技术分享

    技术分享

3.编写第二个agent,实时收集hive的日志到hdfs上

  3.1 编写/opt/cdh/flume-1.5.0-cdh5.3.6/conf/hive-tail-conf.properties

    

# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per agent, 
# in this case called ‘agent‘
# 实时收集hive的日志到hdfs文件系统

#定义agent的三要素:source、channel、sink
a2.sources = s2
a2.channels = c2
a2.sinks = k2

#定义source
a2.sources.s2.type=exec
a2.sources.s2.command=tail -f /opt/cdh/hive-0.13.1-cdh5.3.6/logs/hive.log


#定义channel
a2.channels.c2.type=memory
a2.channels.c2.capacity = 1000
a2.channels.c2.transactionCapacity = 100

#定义sink
a2.sinks.k2.type=hdfs
a2.sinks.k2.hdfs.path = hdfs://life-hadoop.life.com:8020/user/yanglin/flume/hive-tail
#每次刷新多少个event到hdfs,默认:100
a2.sinks.k2.hdfs.batchSize=10
#修改文件类型,默认:SequenceFile
a2.sinks.k2.hdfs.fileType=DataStream
#修改文件的写入格式,默认:Writable
a2.sinks.k2.hdfs.writeFormat=Text


#定义三者之间的关系
a2.sources.s2.channels=c2
a2.sinks.k2.channel=c2

  3.2 启动flume客户端开始收集

    bin/flume-ng agent --conf conf/ --name a2 --conf-file conf/hive-tail-conf.properties -Dflume.root.logger=DEBUG,console

  3.3 启动hive客户端,并查看hdfs系统user/yanglin/flume/hive-tail目录下的变化

    技术分享

  3.4 对于配置ha的hadoop集群,我们需要把core-site.xml和hdfs-site.xml拷贝到flume安装目录的conf目录下

  3.5 如果需要在hdfs根据时间自动创建不同的目录,可以在 a2.sinks.k2.hdfs.path 中使用正则表达式

    hdfs://life-hadoop.life.com:8020/user/yanglin/flume/hive-tail-time/%Y%m%d

    同时,必须指定:a2.sinks.k2.hdfs.useLocalTimeStamp=true

4.第三个agent案例,使用spooling source实时监测某个目录下的文件,经符合条件的文件抽取到hdfs系统中

  4.1 编写/opt/cdh/flume-1.5.0-cdh5.3.6/conf/spooling-conf.properties

# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per agent, 
# in this case called ‘agent‘
# 实时查看指定目录下的文件变化进行收集符合条件的文件到hdfs文件系统

#定义agent的三要素:source、channel、sink
a3.sources = s3
a3.channels = c3
a3.sinks = k3

#定义source
a3.sources.s3.type=spooldir
a3.sources.s3.spoolDir=/opt/cdh/flume-1.5.0-cdh5.3.6/spooling/logs
#设置收集完成之后文件的后缀名,默认:.COMPLETED
a3.sources.s3.fileSuffix=.delete
#设置在指定目录下的那些文件不进行收集,默认:全部进行收集
a3.sources.s3.ignorePattern=^(.)*\\.log$


#定义channel
a3.channels.c3.type=file
a3.channels.c3.capacity = 1000
a3.channels.c3.transactionCapacity = 100

a3.channels.c3.checkpointDir = /opt/cdh/flume-1.5.0-cdh5.3.6/spooling/checkpoint
a3.channels.c3.dataDirs = /opt/cdh/flume-1.5.0-cdh5.3.6/spooling/data

#定义sink
a3.sinks.k3.type=hdfs
a3.sinks.k3.hdfs.path = hdfs://life-hadoop.life.com:8020/user/yanglin/flume/spooling-logs/%Y%m%d
#每次刷新多少个event到hdfs,默认:100
a3.sinks.k3.hdfs.batchSize=10
#修改文件类型,默认:SequenceFile
a3.sinks.k3.hdfs.fileType=DataStream
#修改文件的写入格式,默认:Writable
a3.sinks.k3.hdfs.writeFormat=Text
#设置头部带有时间戳
a3.sinks.k3.hdfs.useLocalTimeStamp=true


#定义三者之间的关系
a3.sources.s3.channels=c3
a3.sinks.k3.channel=c3

  4.2 启动flume客户端进行监测搜集

    bin/flume-ng agent --conf conf/ --name a3 --conf-file conf/spooling-conf.properties -Dflume.root.logger=DEBUG,console

  4.3 查看收集结果

    技术分享

    技术分享

    技术分享

    

    

flume的使用

标签:

原文地址:http://www.cnblogs.com/lifeone/p/5859545.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!