标签:
1.flume的安装和配置
1.1 配置java_home,修改/opt/cdh/flume-1.5.0-cdh5.3.6/conf/flume-env.sh文件
1.2 配置hdfs集成
1.2.1添加hdfs的jar包到/opt/cdh/flume-1.5.0-cdh5.3.6/lib目录下
commons-configuration-1.6.jar
hadoop-common-2.5.0-cdh5.3.6.jar
hadoop-hdfs-2.5.0-cdh5.3.6.jar
hadoop-auth-2.5.0-cdh5.3.6.jar
1.3 查看flume版本 bin/flume-ng version
2.编写第一个agent案例,source使用 netCat Source,channel使用 memory channel,sink使用 logger sink
2.1 编写/opt/cdh/flume-1.5.0-cdh5.3.6/conf/a1-conf.properties
# The configuration file needs to define the sources, # the channels and the sinks. # Sources, channels and sinks are defined per agent, # in this case called ‘agent‘ #定义agent的三要素:source、channel、sink a1.sources = s1 a1.channels = c1 a1.sinks = k1 #定义source a1.sources.s1.type=netcat a1.sources.s1.bind=life-hadoop.life.com a1.sources.s1.port=44444 #定义channel a1.channels.c1.type=memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 #定义sink a1.sinks.k1.type=logger #定义三者之间的关系 a1.sources.s1.channels=c1 a1.sinks.k1.channel=c1
2.2 安装telnet
sudo rpm -ivh xinetd-2.3.14-38.el6.x86_64.rpm telnet-0.17-47.el6_3.1.x86_64.rpm telnet-server-0.17-47.el6_3.1.x86_64.rpm
sudo /etc/rc.d/init.d/xinetd restart
2.3 启动agent
bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/a1-conf.properties -Dflume.root.logger=DEBUG,console
2.4 连接telnet并进行测试
telnet life-hadoop.life.com 44444
3.编写第二个agent,实时收集hive的日志到hdfs上
3.1 编写/opt/cdh/flume-1.5.0-cdh5.3.6/conf/hive-tail-conf.properties
# The configuration file needs to define the sources, # the channels and the sinks. # Sources, channels and sinks are defined per agent, # in this case called ‘agent‘ # 实时收集hive的日志到hdfs文件系统 #定义agent的三要素:source、channel、sink a2.sources = s2 a2.channels = c2 a2.sinks = k2 #定义source a2.sources.s2.type=exec a2.sources.s2.command=tail -f /opt/cdh/hive-0.13.1-cdh5.3.6/logs/hive.log #定义channel a2.channels.c2.type=memory a2.channels.c2.capacity = 1000 a2.channels.c2.transactionCapacity = 100 #定义sink a2.sinks.k2.type=hdfs a2.sinks.k2.hdfs.path = hdfs://life-hadoop.life.com:8020/user/yanglin/flume/hive-tail #每次刷新多少个event到hdfs,默认:100 a2.sinks.k2.hdfs.batchSize=10 #修改文件类型,默认:SequenceFile a2.sinks.k2.hdfs.fileType=DataStream #修改文件的写入格式,默认:Writable a2.sinks.k2.hdfs.writeFormat=Text #定义三者之间的关系 a2.sources.s2.channels=c2 a2.sinks.k2.channel=c2
3.2 启动flume客户端开始收集
bin/flume-ng agent --conf conf/ --name a2 --conf-file conf/hive-tail-conf.properties -Dflume.root.logger=DEBUG,console
3.3 启动hive客户端,并查看hdfs系统user/yanglin/flume/hive-tail目录下的变化
3.4 对于配置ha的hadoop集群,我们需要把core-site.xml和hdfs-site.xml拷贝到flume安装目录的conf目录下
3.5 如果需要在hdfs根据时间自动创建不同的目录,可以在 a2.sinks.k2.hdfs.path 中使用正则表达式
hdfs://life-hadoop.life.com:8020/user/yanglin/flume/hive-tail-time/%Y%m%d
同时,必须指定:a2.sinks.k2.hdfs.useLocalTimeStamp=true
4.第三个agent案例,使用spooling source实时监测某个目录下的文件,经符合条件的文件抽取到hdfs系统中
4.1 编写/opt/cdh/flume-1.5.0-cdh5.3.6/conf/spooling-conf.properties
# The configuration file needs to define the sources, # the channels and the sinks. # Sources, channels and sinks are defined per agent, # in this case called ‘agent‘ # 实时查看指定目录下的文件变化进行收集符合条件的文件到hdfs文件系统 #定义agent的三要素:source、channel、sink a3.sources = s3 a3.channels = c3 a3.sinks = k3 #定义source a3.sources.s3.type=spooldir a3.sources.s3.spoolDir=/opt/cdh/flume-1.5.0-cdh5.3.6/spooling/logs #设置收集完成之后文件的后缀名,默认:.COMPLETED a3.sources.s3.fileSuffix=.delete #设置在指定目录下的那些文件不进行收集,默认:全部进行收集 a3.sources.s3.ignorePattern=^(.)*\\.log$ #定义channel a3.channels.c3.type=file a3.channels.c3.capacity = 1000 a3.channels.c3.transactionCapacity = 100 a3.channels.c3.checkpointDir = /opt/cdh/flume-1.5.0-cdh5.3.6/spooling/checkpoint a3.channels.c3.dataDirs = /opt/cdh/flume-1.5.0-cdh5.3.6/spooling/data #定义sink a3.sinks.k3.type=hdfs a3.sinks.k3.hdfs.path = hdfs://life-hadoop.life.com:8020/user/yanglin/flume/spooling-logs/%Y%m%d #每次刷新多少个event到hdfs,默认:100 a3.sinks.k3.hdfs.batchSize=10 #修改文件类型,默认:SequenceFile a3.sinks.k3.hdfs.fileType=DataStream #修改文件的写入格式,默认:Writable a3.sinks.k3.hdfs.writeFormat=Text #设置头部带有时间戳 a3.sinks.k3.hdfs.useLocalTimeStamp=true #定义三者之间的关系 a3.sources.s3.channels=c3 a3.sinks.k3.channel=c3
4.2 启动flume客户端进行监测搜集
bin/flume-ng agent --conf conf/ --name a3 --conf-file conf/spooling-conf.properties -Dflume.root.logger=DEBUG,console
4.3 查看收集结果
标签:
原文地址:http://www.cnblogs.com/lifeone/p/5859545.html