flume安装与使用

时间：2019-09-02 09:35:39 阅读：86 评论：0 收藏：0 [点我收藏+]

标签：load 实战案例 figure 安装配置数据监控 cal arc 三台表达式

日志采集框架Flume

Flume介绍

概述

Flume是一个分布式、可靠、和高可用的海量日志采集、聚合和传输的系统。

Flume可以采集文件，socket数据包、文件、文件夹、kafka等各种形式源数据，又可以将采集到的数据(下沉sink)输出到HDFS、hbase、hive、kafka等众多外部存储系统中
运行机制

Flume分布式系统最核心的角色是agent，flume采集系统就是由一个个agent所连接起来而成

每一个agent相当于一个数据传递员，内部有三个组件：
- Source：采集组件，用于跟数据源对接，获取数据
- Sink：下沉组件，用于往下一级agent传递数据或者往最终存储系统传递数据
- Channel：传输通道组件，用于从source将数据传递到sink
采集系统结构图
- 简单结构
- 复杂结构
  
  多级agent之间串联

Flume实战案例

安装部署

第一步：下载解压修改配置文件

Flume的安装非常简单，只需要解压即可，当然，前提是已有hadoop环境

# 上传安装包到数据源所在节点上 这里采用在第三台机器来进行安装 软件目录 => flume-ng-1.6.0-cdh5.14.0.tar.gz
tar -zxvf flume-ng-1.6.0-cdh5.14.0.tar.gz -C ../servers/
cd ../servers/apache-flume-1.6.0-cdh5.14.0-bin/conf/
cp flume-env.sh.template flume-env.sh
vim flume-env.sh #只添加一个java环境就可以了
  export JAVA_HOME=/export/servers/jdk1.8.0_141

第二步：开发配置文件

# 根据数据采集的需求配置采集方案，描述在配置文件中(文件名可任意自定义)
# 配置我们的网络收集的配置文件
# 在flume的conf目录下新建一个配置文件（采集方案）
vim /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf/netcat-logger.conf
    # 定义这个agent中各组件的名字
    a1.sources = r1
    a1.sinks = k1
    a1.channels = c1

    # 描述和配置source组件：r1
    a1.sources.r1.type = netcat
    a1.sources.r1.bind = 192.168.52.120
    a1.sources.r1.port = 44444

    # 描述和配置sink组件：k1
    a1.sinks.k1.type = logger

    # 描述和配置channel组件，此处使用是内存缓存的方式
    a1.channels.c1.type = memory
    a1.channels.c1.capacity = 1000
    a1.channels.c1.transactionCapacity = 100

    # 描述和配置source  channel   sink之间的连接关系
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1

启动配置文件

指定采集方案配置文件，在相应的节点上启动flume agent

cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin
bin/flume-ng agent -c conf -f conf/netcat-logger.conf -n a1 -Dflume.root.logger=INFO,console
# -c conf 指定flume自身的配置文件所在目录
# -f conf/netcat-logger.conf 指定所描述的采集方案
# -n a1 指定这个agent的名字

安装telent准备测试

在node02上安装telnet客户端用于模拟数据的发送
```
yum -y install telnet
telnet  node03  44444   # 使用telnet模拟数据发送
```

采集案例

采集目录到HDFS

某服务器的特定目录下会不断产生新的文件，每当有新文件出现，就需要把文件采集到HDFS中去

根据需求，首先定义以下3大要素
- 数据源组件，即source -- 监控文件目录：spooldirspooldir特性：
  - 监视一个目录，只要目录中出现新文件，就会采集文件中的内容
  - 采集完成的文件，会被agent自动添加一个后缀：COMPLETED
  - 所监视的目录中不允许重复出现相同文件名的文件
- 下沉组件，妈sink -- HDFS文件系统：hdfs sink
- 通道组件，妈channel -- 可用file channel 也可以用内存 memory channel

flume配置文件开发

cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf
mkdir -p /export/servers/dirfile
vim spooldir.conf
    # 定义agent的组件名字
    a1.sources=sr1
    a1.sinks=sk1
    a1.channels=scn1

    # 配置数据源source
    a1.sources.sr1.type=spooldir
    a1.sources.sr1.spoolDir=/export/servers/dirfile
    a1.sources.sr1.fileHeader=true

    # 配置下沉组件sink
    a1.sinks.sk1.type=hdfs
    a1.sinks.sk1.channel=scn1
    # hdfs目录路径
    a1.sinks.sk1.hdfs.path=hdfs://node01:8020/spooldir/files/%y-%m-%d/%H%M/
    # 写入hdfs的文件名前缀 可以使用flume提供的日期及%{host}表达式
    a1.sinks.sk1.hdfs.filePrefix=events-
    # 表示到了需要触发的时间时，是否要更新文件夹，true:表示要
    a1.sinks.sk1.hdfs.round=true
    # 表示每隔value分钟改变一次(在0~24之间)
    a1.sinks.sk1.hdfs.roundValue=10
    # 切换文件的时候的时间单位是分钟
    a1.sinks.sk1.hdfs.roundUnit=minute
    # 多久时间后close hdfs文件。单位是秒，默认30秒。设置为0的话表示不根据时间close hdfs文件
    a1.sinks.sk1.hdfs.rollInterval=3
    # 文件大小超过一定值后，close文件。默认值1024，单位是字节。设置为0的话表示不基于文件大小,134217728表 示128m，决定了多大块可以切一个文件。
    a1.sinks.sk1.hdfs.rollSize=134217728
    # 写入了多少个事件后close文件。默认值是10个。设置为0的话表示不基于事件个数
    a1.sinks.sk1.hdfs.rollCount=0
    # 批次数，HDFS Sink每次从Channel中拿的事件个数。默认值100
    a1.sinks.sk1.hdfs.batchSize=100
    # 使用本地时间戳
    a1.sinks.sk1.hdfs.useLocalTimeStamp=true
    #生成的文件类型默认是 Sequencefile,可用DataStream则为普通文本
    a1.sinks.sk1.hdfs.fileType=DataStream

    # 配置通道channel
    a1.channels.scn1.type=memory
    a1.channels.scn1.capacity=1000
    a1.channels.scn1.transactionCapacity=100

bin/flume-ng agent -c ./conf/ -f ./conf/spooldir.conf -n a1 -Dflume.root.logger=INFO,console # 运行flume

采集文件到HDFS

比如业务系统使用Log4j生成的日志，日志内容不断增加，需要把追加到日志文件中的数据实时采集到hdfs

根据需求，首先定义以下3大要素
- 采集源，即source——监控文件内容更新 : exec ‘tail -F file’
- 下沉目标，即sink——HDFS文件系统 : hdfs sink
- Source和sink之间的传递通道——channel，可用filechannel 也可以用内存channel

定义flume的配置文件

cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf
vim tail-file.conf
    agent1.sources = source1
    agent1.sinks = sink1
    agent1.channels = channel1

    # Describe/configure tail -F source1
    agent1.sources.source1.type = exec
    agent1.sources.source1.command = tail -F /export/servers/taillogs/access_log
    agent1.sources.source1.channels = channel1

    #configure host for source
    #agent1.sources.source1.interceptors = i1
    #agent1.sources.source1.interceptors.i1.type = host
    #agent1.sources.source1.interceptors.i1.hostHeader = hostname

    # Describe sink1
    agent1.sinks.sink1.type = hdfs
    #a1.sinks.k1.channel = c1
    agent1.sinks.sink1.hdfs.path = hdfs://node01:8020/weblog/flume-collection/%y-%m-%d/%H-%M
    agent1.sinks.sink1.hdfs.filePrefix = access_log
    agent1.sinks.sink1.hdfs.maxOpenFiles = 5000
    agent1.sinks.sink1.hdfs.batchSize= 100
    agent1.sinks.sink1.hdfs.fileType = DataStream
    agent1.sinks.sink1.hdfs.writeFormat =Text
    agent1.sinks.sink1.hdfs.rollSize = 102400
    agent1.sinks.sink1.hdfs.rollCount = 1000000
    agent1.sinks.sink1.hdfs.rollInterval = 60
    agent1.sinks.sink1.hdfs.round = true
    agent1.sinks.sink1.hdfs.roundValue = 10
    agent1.sinks.sink1.hdfs.roundUnit = minute
    agent1.sinks.sink1.hdfs.useLocalTimeStamp = true

    # Use a channel which buffers events in memory
    agent1.channels.channel1.type = memory
    agent1.channels.channel1.keep-alive = 120
    agent1.channels.channel1.capacity = 500000
    agent1.channels.channel1.transactionCapacity = 600

    # Bind the source and sink to the channel
    agent1.sources.source1.channels = channel1
    agent1.sinks.sink1.channel = channel1

bin/flume-ng agent -c conf -f conf/tail-file.conf -n agent1  -Dflume.root.logger=INFO,console #启动Flume
# 开发shell脚本定时追加文件内容
mkdir -p /export/servers/shells/
cd /export/servers/shells/
    vim tail-file.sh
    #!/bin/bash
    while true
    do
     date >> /export/servers/taillogs/access_log;

两个agent级联

技术图片

第一个agent负责收集文件当中的数据，通过网络发送到第二个agent当中去，第二个agent负责接收第一个agent发送的数据，并将数据保存到hdfs上面去

第一步：node02安装flume

cd /export/servers
scp -r apache-flume-1.6.0-cdh5.14.0-bin/ node02:$PWD

第二步：node02配置flume配置文件

cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf
vim tail-avro-avro-logger.conf
    ##################
    # Name the components on this agent
    a1.sources = r1
    a1.sinks = k1
    a1.channels = c1
    # Describe/configure the source
    a1.sources.r1.type = exec
    a1.sources.r1.command = tail -F /export/servers/taillogs/access_log
    a1.sources.r1.channels = c1
    # Describe the sink
    ##sink端的avro是一个数据发送者
    a1.sinks = k1
    a1.sinks.k1.type = avro
    a1.sinks.k1.channel = c1
    a1.sinks.k1.hostname = 192.168.52.120
    a1.sinks.k1.port = 4141
    a1.sinks.k1.batch-size = 10
    # Use a channel which buffers events in memory
    a1.channels.c1.type = memory
    a1.channels.c1.capacity = 1000
    a1.channels.c1.transactionCapacity = 100
    # Bind the source and sink to the channel
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1

第三步：node02开发脚本文件往文件写入数据

# 直接把node03的脚本拷贝至node02
cd /export/servers
scp -r shells/ taillogs/ node02:$PWD

第四步node03开发Flume配置文件

# 在node03机器上开发flume的配置文件
cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf
vim avro-hdfs.conf #配置如下
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
##source中的avro组件是一个接收者服务
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 192.168.52.120
a1.sources.r1.port = 4141
# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://node01:8020/avro/hdfs/%y-%m-%d/%H%M/
a1.sinks.k1.hdfs.filePrefix = events-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute
a1.sinks.k1.hdfs.rollInterval = 3
a1.sinks.k1.hdfs.rollSize = 20
a1.sinks.k1.hdfs.rollCount = 5
a1.sinks.k1.hdfs.batchSize = 1
a1.sinks.k1.hdfs.useLocalTimeStamp = true
#生成的文件类型，默认是Sequencefile，可用DataStream，则为普通文本
a1.sinks.k1.hdfs.fileType = DataStream
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

第五步顺序启动

# node03机器启动flume进程
cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin
bin/flume-ng agent -c conf -f conf/avro-hdfs.conf -n a1  -Dflume.root.logger=INFO,console  

# node02机器启动flume进程
cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/
bin/flume-ng agent -c conf -f conf/tail-avro-avro-logger.conf -n a1  -Dflume.root.logger=INFO,console    

# node02机器启shell脚本生成文件
cd /export/servers/shells
sh tail-file.sh

高可用Flume-NG配置案例failover

角色分配

名称 HOST 角色

Agent1 node01 Web Server

Collector1 node02 AgentMstr1

Collector2 node03 AgentMstr2

名称	HOST	角色
Agent1	node01	Web Server
Collector1	node02	AgentMstr1
Collector2	node03	AgentMstr2

node01安装配置flume

# node03机器执行以下命令
cd /export/servers
scp -r apache-flume-1.6.0-cdh5.14.0-bin/ node01:$PWD
scp -r shells/ taillogs/ node01:$PWD

# node01机器配置agent的配置文件
cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf
vim agent.conf #配置如下

#agent1 name
agent1.channels = c1
agent1.sources = r1
agent1.sinks = k1 k2
#
##set gruop
agent1.sinkgroups = g1
#
##set channel
agent1.channels.c1.type = memory
agent1.channels.c1.capacity = 1000
agent1.channels.c1.transactionCapacity = 100
#
agent1.sources.r1.channels = c1
agent1.sources.r1.type = exec
agent1.sources.r1.command = tail -F /export/servers/taillogs/access_log
#
agent1.sources.r1.interceptors = i1 i2
agent1.sources.r1.interceptors.i1.type = static
agent1.sources.r1.interceptors.i1.key = Type
agent1.sources.r1.interceptors.i1.value = LOGIN
agent1.sources.r1.interceptors.i2.type = timestamp
#
## set sink1
agent1.sinks.k1.channel = c1
agent1.sinks.k1.type = avro
agent1.sinks.k1.hostname = node02
agent1.sinks.k1.port = 52020
#
## set sink2
agent1.sinks.k2.channel = c1
agent1.sinks.k2.type = avro
agent1.sinks.k2.hostname = node03
agent1.sinks.k2.port = 52020
#
##set sink group
agent1.sinkgroups.g1.sinks = k1 k2
#
##set failover
agent1.sinkgroups.g1.processor.type = failover
agent1.sinkgroups.g1.processor.priority.k1 = 10
agent1.sinkgroups.g1.processor.priority.k2 = 1
agent1.sinkgroups.g1.processor.maxpenalty = 10000
#

node02与node03配置flumecollection

# node02机器修改配置文件
cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf
vim collector.conf
#set Agent name
a1.sources = r1
a1.channels = c1
a1.sinks = k1
#
##set channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
#
## other node,nna to nns
a1.sources.r1.type = avro
a1.sources.r1.bind = node02
a1.sources.r1.port = 52020
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = static
a1.sources.r1.interceptors.i1.key = Collector
a1.sources.r1.interceptors.i1.value = node02
a1.sources.r1.channels = c1
#
##set sink to hdfs
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path= hdfs://node01:8020/flume/failover/
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=TEXT
a1.sinks.k1.hdfs.rollInterval=10
a1.sinks.k1.channel=c1
a1.sinks.k1.hdfs.filePrefix=%Y-%m-%d

# node03机器修改配置文件
cd  /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf
vim collector.conf
#set Agent name
a1.sources = r1
a1.channels = c1
a1.sinks = k1
#
##set channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
#
## other node,nna to nns
a1.sources.r1.type = avro
a1.sources.r1.bind = node03
a1.sources.r1.port = 52020
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = static
a1.sources.r1.interceptors.i1.key = Collector
a1.sources.r1.interceptors.i1.value = node03
a1.sources.r1.channels = c1
#
##set sink to hdfs
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path= hdfs://node01:8020/flume/failover/
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=TEXT
a1.sinks.k1.hdfs.rollInterval=10
a1.sinks.k1.channel=c1
a1.sinks.k1.hdfs.filePrefix=%Y-%m-%d

顺序启动命令

# node03机器上面启动flume
cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin
bin/flume-ng agent -n a1 -c conf -f conf/collector.conf -Dflume.root.logger=DEBUG,console

# node02机器上面启动flume
cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin
bin/flume-ng agent -n a1 -c conf -f conf/collector.conf -Dflume.root.logger=DEBUG,console

# node01机器上面启动flume
cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin
bin/flume-ng agent -n agent1 -c conf -f conf/agent.conf -Dflume.root.logger=DEBUG,console

# node01机器启动文件产生脚本
cd  /export/servers/shells
sh tail-file.sh

FAILOVER测试
- Collector1宕机，Collector2获取优先上传权限
- 重启Collector1服务，Collector1重新获得优先上传的权限

Flume的负载均衡 load balancer

负载均衡是用于解决一台机器(一个进程)无法解决所有请求而产生的一种算法。Load balancing Sink Processor 能够实现 load balance 功能，如下图Agent1 是一个路由节点，负责将
Channel 暂存的 Event 均衡到对应的多个 Sink组件上，而每个 Sink 组件分别连接到一个独立的 Agent 上，示例配置，如下所示：

技术图片

在此处我们通过三台机器来进行模拟flume的负载均衡

三台机器规划如下：

node01：采集数据，发送到node02和node03机器上去

node02：接收node01的部分数据

node03：接收node01的部分数据

第一步：开发node01服务器的flume配置

# node01服务器配置：
cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf
vim load_banlancer_client.conf

#agent name
a1.channels = c1
a1.sources = r1
a1.sinks = k1 k2

#set gruop
a1.sinkgroups = g1

#set channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.channels = c1
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /export/servers/taillogs/access_log

# set sink1
a1.sinks.k1.channel = c1
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = node02
a1.sinks.k1.port = 52020

# set sink2
a1.sinks.k2.channel = c1
a1.sinks.k2.type = avro
a1.sinks.k2.hostname = node03
a1.sinks.k2.port = 52020

#set sink group
a1.sinkgroups.g1.sinks = k1 k2

#set failover
a1.sinkgroups.g1.processor.type = load_balance
a1.sinkgroups.g1.processor.backoff = true
a1.sinkgroups.g1.processor.selector = round_robin
a1.sinkgroups.g1.processor.selector.maxTimeOut=10000

第二步：开发node02服务器的flume配置

# node02服务器配置：
cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf
vim load_banlancer_server.conf

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = node02
a1.sources.r1.port = 52020

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

第三步：开发node03服务器flume配置

# node03服务器配置
cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf
vim load_banlancer_server.conf

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = node03
a1.sources.r1.port = 52020

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

第四步：准备启动flume服务

# 启动node03的flume服务
cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin
bin/flume-ng agent -n a1 -c conf -f conf/load_banlancer_server.conf -Dflume.root.logger=DEBUG,console

# 启动node02的flume服务
cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin
bin/flume-ng agent -n a1 -c conf -f conf/load_banlancer_server.conf -Dflume.root.logger=DEBUG,console

# 启动node01的flume服务
cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin
bin/flume-ng agent -n a1 -c conf -f conf/load_banlancer_client.conf -Dflume.root.logger=DEBUG,console

# node01服务器运行脚本产生数据
cd /export/servers/shells
sh tail-file.sh

Flume案例一

把A、B 机器中的access.log、nginx.log、web.log 采集汇总到C机器上然后统一收集到hdfs中。

但是在hdfs中要求的目录为：

/source/logs/access/20180101/**

/source/logs/nginx/20180101/**

/source/logs/web/20180101/**

采集端配置文件开发

# node01与node02服务器开发flume的配置文件
cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf
vim exec_source_avro_sink.conf

# Name the components on this agent
a1.sources = r1 r2 r3
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /export/servers/taillogs/access.log
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = static
##  static拦截器的功能就是往采集到的数据的header中插入自己定## 义的key-value对
a1.sources.r1.interceptors.i1.key = type
a1.sources.r1.interceptors.i1.value = access

a1.sources.r2.type = exec
a1.sources.r2.command = tail -F /export/servers/taillogs/nginx.log
a1.sources.r2.interceptors = i2
a1.sources.r2.interceptors.i2.type = static
a1.sources.r2.interceptors.i2.key = type
a1.sources.r2.interceptors.i2.value = nginx

a1.sources.r3.type = exec
a1.sources.r3.command = tail -F /export/servers/taillogs/web.log
a1.sources.r3.interceptors = i3
a1.sources.r3.interceptors.i3.type = static
a1.sources.r3.interceptors.i3.key = type
a1.sources.r3.interceptors.i3.value = web

# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = node03
a1.sinks.k1.port = 41414

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 20000
a1.channels.c1.transactionCapacity = 10000

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sources.r2.channels = c1
a1.sources.r3.channels = c1
a1.sinks.k1.channel = c1

服务端配置文件开发

# 在node03上面开发flume配置文件
cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf
vim avro_source_hdfs_sink.conf

a1.sources = r1
a1.sinks = k1
a1.channels = c1
#定义source
a1.sources.r1.type = avro
a1.sources.r1.bind = 192.168.52.120
a1.sources.r1.port =41414

#添加时间拦截器
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = org.apache.flume.interceptor.TimestampInterceptor$Builder

#定义channels
a1.channels.c1.type = memory
a1.channels.c1.capacity = 20000
a1.channels.c1.transactionCapacity = 10000

#定义sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path=hdfs://192.168.52.100:8020/source/logs/%{type}/%Y%m%d
a1.sinks.k1.hdfs.filePrefix =events
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat = Text
#时间类型
a1.sinks.k1.hdfs.useLocalTimeStamp = true
#生成的文件不按条数生成
a1.sinks.k1.hdfs.rollCount = 0
#生成的文件按时间生成
a1.sinks.k1.hdfs.rollInterval = 30
#生成的文件按大小生成
a1.sinks.k1.hdfs.rollSize  = 10485760
#批量写入hdfs的个数
a1.sinks.k1.hdfs.batchSize = 10000
#flume操作hdfs的线程数（包括新建，写入等）
a1.sinks.k1.hdfs.threadsPoolSize=10
#操作hdfs超时时间
a1.sinks.k1.hdfs.callTimeout=30000

#组装source、channel、sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

采集端文件生成脚本

cd /export/servers/shells
vim server.sh 

#!/bin/bash
while true
do  
 date >> /export/servers/taillogs/access.log; 
 date >> /export/servers/taillogs/web.log;
 date >> /export/servers/taillogs/nginx.log;
  sleep 0.5;
done

顺序启动服务

# node03启动flume实现数据收集
cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin
bin/flume-ng agent -c conf -f conf/avro_source_hdfs_sink.conf -name a1 -Dflume.root.logger=DEBUG,console

# node01与node02启动flume实现数据监控
cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin
bin/flume-ng agent -c conf -f conf/exec_source_avro_sink.conf -name a1 -Dflume.root.logger=DEBUG,console

# node01与node02启动生成文件脚本
cd /export/servers/shells
sh server.sh

flume安装与使用

标签：load 实战案例 figure 安装配置数据监控 cal arc 三台表达式

原文地址：https://www.cnblogs.com/winter-shadow/p/11444572.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行

flume安装与使用

日志采集框架Flume

Flume介绍

Flume实战案例

安装部署

采集案例

采集目录到HDFS

采集文件到HDFS

两个agent级联

第一步：node02安装flume

第二步：node02配置flume配置文件

第三步：node02开发脚本文件往文件写入数据

第四步node03开发Flume配置文件

第五步顺序启动

更多source和sink组件

高可用Flume-NG配置案例failover

Flume的负载均衡 load balancer

Flume案例一