码迷,mamicode.com
首页 > Web开发 > 详细

日志采集框架Flume的安装及使用

时间:2017-02-24 16:27:40      阅读:234      评论:0      收藏:0      [点我收藏+]

标签:实现   pool   ddc   exe   logout   1.3   source   cal   byte   

日志采集框架Flume的安装及使用

1.Flume介绍

1.1.Flume概述

Flume是一个分布式、可靠、和高可用(旧版Flume og才有高可用)的海量日志采集、传输和聚合的系统。
Flume可以采集文件,socket数据包等各种形式源数据,
    又可以将采集到的数据输出到HDFS、hbase、hive、kafka等众多外部存储系统中
    一般的采集需求,通过对flume的简单配置即可实现
Flume针对特殊场景也具备良好的自定义扩展能力,因此,flume可以适用于大部分的日常数据采集场景

1.2.运行机制

1、Flume分布式系统中最核心的角色是agent,flume采集系统就是由一个个agent所连接起来形成
2、每一个agent相当于一个数据传递员,内部有三个组件:
a)Source:采集源,用于跟数据源对接,以获取数据
b)Sink:下沉地,采集数据的传送目的,用于往下一级agent传递数据或者往最终存储系统传递数据
c)Channel:angent内部的数据传输通道,用于从source中将数据传递到sink

技术分享

1.3.Flume采集系统结构图

1.3.1.简单结构

单个agent采集数据

技术分享

1.3.2.复杂结构

多级agent之间串联

技术分享

2.安装Flume

2.1.解压Flume压缩文件到指定目录

[root@node02 software]# tar -zxvf apache-flume-1.6.0-bin.tar.gz -C /opt/modules/

2.2.文件重命名

[root@node02 modules]# mv apache-flume-1.6.0-bin flume-1.6.0
You have new mail in /var/spool/mail/root
[root@node02 modules]# ll
total 24
drwxr-xr-x.  9 matrix matrix 4096 Jan  7 13:44 elasticsearch-2.4.2
drwxr-xr-x.  7 root   root   4096 Jan 24 13:09 flume-1.6.0
drwxr-xr-x. 12 matrix matrix 4096 Jan 23 21:00 hadoop-2.5.1
drwxr-xr-x.  8 root   root   4096 Jan 23 18:43 hive-1.2.1
drwxr-xr-x.  3 matrix matrix 4096 Dec 19 16:01 journalnode
drwxr-xr-x. 12 matrix matrix 4096 Dec 17 21:20 zookeeper

2.3.配置Flume环境变量

[root@node02 ~]# ls -a
.   anaconda-ks.cfg  .bash_logout   .bashrc  .hivehistory  install.log.syslog      .mysql_history  .ssh     zookeeper.out
..  .bash_history    .bash_profile  .cshrc   install.log   jdk-7u79-linux-x64.rpm  .pki            .tcshrc
[root@node02 ~]# vi .bash_profile

技术分享

export FLUME_HOME=/opt/modules/flume-1.6.0
export PATH=$PATH:$FLUME_HOME/bin

技术分享

2.4.使配置生效

[root@node02 ~]# source .bash_profile

2.5.采集文件到HDFS

采集需求:比如业务系统使用log4j生成的日志,日志内容不断增加,
需要把追加到日志文件中的数据实时采集到hdfs

根据需求,首先定义以下3大要素
采集源,即source——监控文件内容更新:exec  ‘tail -F file‘
下沉目标,即sink——HDFS文件系统:hdfs sink
Source和sink之间的传递通道——channel,可用file channel也可以用内存channel

2.5.1.配置Flume配置文件

[root@node02 flume-1.6.0]# vi conf/tail-hdfs.conf

技术分享

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

#exec 指的是命令
# Describe/configure the source
a1.sources.r1.type = exec
#F根据文件名追中, f根据文件的nodeid追中
a1.sources.r1.command = tail -F /home/hadoop/log/test.log
a1.sources.r1.channels = c1

# Describe the sink
#下沉目标
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
#指定目录, flum帮做目的替换
a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/
#文件的命名, 前缀
a1.sinks.k1.hdfs.filePrefix = events-

#10 分钟就改目录
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute

#文件滚动之前的等待时间(秒)
a1.sinks.k1.hdfs.rollInterval = 3

#文件滚动的大小限制(bytes)
a1.sinks.k1.hdfs.rollSize = 500

#写入多少个event数据后滚动文件(事件个数)
a1.sinks.k1.hdfs.rollCount = 20

#5个事件就往里面写入
a1.sinks.k1.hdfs.batchSize = 5

#用本地时间格式化目录
a1.sinks.k1.hdfs.useLocalTimeStamp = true

#下沉后, 生成的文件类型,默认是Sequencefile,可用DataStream,则为普通文本
a1.sinks.k1.hdfs.fileType = DataStream

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

技术分享

2.6.写数据到指定的文件中

[root@node02 flume-1.6.0]# mkdir -p /home/hadoop/log
[root@node02 flume-1.6.0]# touch /home/hadoop/log/test.log
[root@node02 ~]# while true
> do
> echo 11111111111 >> /home/hadoop/log/test.log
> sleep 0.6
> done
while true
do
echo 11111111111 >> /home/hadoop/log/test.log
sleep 0.6
done

技术分享

[root@node02 flume-1.6.0]# tail -f /home/hadoop/log/test.log

2.7.启动Flume日志收集

注意:检查Hadoop HDFS是否启动,如没有启动,则启动
[root@node02 flume-1.6.0]# ./bin/flume-ng agent -c conf -f conf/tail-hdfs.conf -n a1

技术分享

技术分享

2.8.通过Hadoop Web UI查看Flume在HDFS上创建的目录

[root@node02 hadoop-2.5.1]# ./bin/hdfs dfs -ls -R /flume
drwxr-xr-x   - root supergroup          0 2017-01-24 13:36 /flume/events
drwxr-xr-x   - root supergroup          0 2017-01-24 13:36 /flume/events/17-01-24
drwxr-xr-x   - root supergroup          0 2017-01-24 13:38 /flume/events/17-01-24/1330
-rw-r--r--   3 root supergroup        140 2017-01-24 13:36 /flume/events/17-01-24/1330/events-.1485236169660
-rw-r--r--   3 root supergroup        140 2017-01-24 13:36 /flume/events/17-01-24/1330/events-.1485236169661
-rw-r--r--   3 root supergroup         70 2017-01-24 13:36 /flume/events/17-01-24/1330/events-.1485236169662
-rw-r--r--   3 root supergroup         77 2017-01-24 13:36 /flume/events/17-01-24/1330/events-.1485236189653
-rw-r--r--   3 root supergroup         77 2017-01-24 13:36 /flume/events/17-01-24/1330/events-.1485236195683
-rw-r--r--   3 root supergroup         77 2017-01-24 13:36 /flume/events/17-01-24/1330/events-.1485236201641
-rw-r--r--   3 root supergroup         77 2017-01-24 13:36 /flume/events/17-01-24/1330/events-.1485236207790
-rw-r--r--   3 root supergroup         84 2017-01-24 13:36 /flume/events/17-01-24/1330/events-.1485236213809
-rw-r--r--   3 root supergroup         70 2017-01-24 13:37 /flume/events/17-01-24/1330/events-.1485236219808
-rw-r--r--   3 root supergroup         84 2017-01-24 13:37 /flume/events/17-01-24/1330/events-.1485236225867
-rw-r--r--   3 root supergroup         77 2017-01-24 13:37 /flume/events/17-01-24/1330/events-.1485236231852
-rw-r--r--   3 root supergroup         70 2017-01-24 13:37 /flume/events/17-01-24/1330/events-.1485236238116
-rw-r--r--   3 root supergroup         84 2017-01-24 13:37 /flume/events/17-01-24/1330/events-.1485236244133
-rw-r--r--   3 root supergroup         63 2017-01-24 13:37 /flume/events/17-01-24/1330/events-.1485236250160
-rw-r--r--   3 root supergroup         56 2017-01-24 13:37 /flume/events/17-01-24/1330/events-.1485236254744
-rw-r--r--   3 root supergroup         42 2017-01-24 13:37 /flume/events/17-01-24/1330/events-.1485236260456
-rw-r--r--   3 root supergroup         35 2017-01-24 13:37 /flume/events/17-01-24/1330/events-.1485236264210
-rw-r--r--   3 root supergroup         35 2017-01-24 13:37 /flume/events/17-01-24/1330/events-.1485236267832
-rw-r--r--   3 root supergroup         49 2017-01-24 13:37 /flume/events/17-01-24/1330/events-.1485236271410
-rw-r--r--   3 root supergroup         84 2017-01-24 13:38 /flume/events/17-01-24/1330/events-.1485236275630
-rw-r--r--   3 root supergroup         77 2017-01-24 13:38 /flume/events/17-01-24/1330/events-.1485236281581
-rw-r--r--   3 root supergroup         77 2017-01-24 13:38 /flume/events/17-01-24/1330/events-.1485236287587
-rw-r--r--   3 root supergroup         84 2017-01-24 13:38 /flume/events/17-01-24/1330/events-.1485236293646
-rw-r--r--   3 root supergroup         70 2017-01-24 13:38 /flume/events/17-01-24/1330/events-.1485236299642
-rw-r--r--   3 root supergroup         49 2017-01-24 13:38 /flume/events/17-01-24/1330/events-.1485236305888
-rw-r--r--   3 root supergroup         70 2017-01-24 13:38 /flume/events/17-01-24/1330/events-.1485236311177
-rw-r--r--   3 root supergroup         35 2017-01-24 13:38 /flume/events/17-01-24/1330/events-.1485236315362
-rw-r--r--   3 root supergroup         77 2017-01-24 13:38 /flume/events/17-01-24/1330/events-.1485236320019
-rw-r--r--   3 root supergroup         77 2017-01-24 13:38 /flume/events/17-01-24/1330/events-.1485236324629
-rw-r--r--   3 root supergroup         42 2017-01-24 13:38 /flume/events/17-01-24/1330/events-.1485236330636

技术分享

3.Flume多个agent连接

技术分享

从tail命令获取数据发送到avro端口(tail->avro)
另一个节点可配置一个avro源来中继数据,发送外部存储(avro->log)

3.1.在node03上安装Flume

3.2.配置node02上的Flume

从tail命令获取数据发送到avro端口
[root@node02 flume-1.6.0]# vi conf/tail-avro.conf
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /home/hadoop/log/test.log
a1.sources.r1.channels = c1

# Describe the sink
#绑定的不是本机, 是另外一台机器的服务地址, sink端的avro是一个发送端, avro的客户端, 往node03这个机器上发
a1.sinks = k1
a1.sinks.k1.type = avro
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = 192.168.230.12
a1.sinks.k1.port = 4141
a1.sinks.k1.batch-size = 2

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

3.2.1.运行node02上的Flume,发送数据给node03

[root@node02 flume-1.6.0]# ./bin/flume-ng agent -c conf -f conf/tail-avro.conf -n a1

技术分享

3.3.配置node03上的Flume

配置一个avro源来中继数据,发送外部存储/hdfs(avro->log)

[root@node03 flume-1.6.0]# vi conf/avro-hdfs.conf
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
#source中的avro组件是接收者服务, 绑定本机
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 4141

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

3.3.1.运行node03上的Flume,接收来自node02的数据

[root@node03 flume-1.6.0]# ./bin/flume-ng agent -c conf -f conf/avro-hdfs.conf -n a1 -Dflume.root.logger=INFO,console

技术分享

可以看到4141端口有监听
[root@node03 ~]# netstat -nltp

技术分享

技术分享

3.3.发送数据

日志采集框架Flume的安装及使用

标签:实现   pool   ddc   exe   logout   1.3   source   cal   byte   

原文地址:http://blog.csdn.net/qq_25371579/article/details/56840249

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!