Flume基础概念及其环境配置

Flume基础概念及其环境配置,第1张

Flume基础概念及其环境配置 Flume基础概念及其环境配置

1、基本组件
1)source:收集(负责从xxx地方采集数据)
2)channel:聚集(数据存入内存/文件/写入到kafka中)
3)sink:输出(负责读取channel数据,写入到目的地(一个或者多个)去(hdfs/hive中))
2、环境配置(前提是安装了jdk8)

# 上传文件并且进行解压到某文件夹
tar zxvf /root/software/tran_zip/apache-flume-1.6.0-bin.tar.gz -C software
# 配置环境变量
vi  /etc/profile
# 进行加载
source /etc/profile

# 修改配置文件flume-env.sh
cd /root/software/flume-1.6.0/conf
cp flume-env.sh.template flume-env.sh
vi flume-env.sh

# 验证是否安装成功
# 在bin目录下
flume-ng version


3、实战:从指定网络端口采集数据到控制台
使用Flume的关键是写配置文件
1)配置Source
2)配置Channel
3)配置Sink
4)将上述三个组件进行连接起来

# example.conf:单节点 Flume 配置
a1:agent名称
r1:sources的名称
c1:channels的名称
k1:sinks的名称
# example.conf: A single-node Flume configuration

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost/master
a1.sources.r1.port = 44444

# Describe the sink 将日志输出到控制台
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory 
a1.channels.c1.type = memory

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
# 一个source输出到多个channel,一个channel可以输出到一个sink


启动agent

flume-ng agent --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/example.conf --name a1 -Dflume.root.logger=INFO,console

进行测试

# 需要使yum进行安装telnet => 命令:yum -y intsall telnet 
telnet master 44444

4、实战二:监控一个文件 实时采集 新增的数据 输出到控制台

#Agent 选型: exec source
#name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /root/data/flume_data.log

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory 
a1.channels.c1.type = memory

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
# 一个source输出到多个channel,一个channel可以输出到一个sink


执行代码:

# 步骤:
# 1、创建了exec-memory-logger.conf文件
# 2、改写创建的文件
# 3、进行执行代码
flume-ng agent --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/exec-memory-logger.conf --name a1 -Dflume.root.logger=INFO,console

结果展示:

4、实战三:将A服务器上(website)的日志实时采集到B服务器上(hdfs)
A服务器上:exec source + memory channel + 跨结点采用的avro Sink
B服务器上:avro source + memory channel + 输出到控制台 logger-sink

# A服务器上
aver-memory-avro.conf

# Name the components on this agent
exec-memory-avro.sources = exec-source
exec-memory-avro.sinks = avro-sink
exec-memory-avro.channels = memory-channel

# Describe/configure the source
exec-memory-avro.sources.exec-source.type = exec
exec-memory-avro.sources.exec-source.command = tail -F /root/data/log_data.log
exec-memory-avro.sources.exec-source.shell = /bin/sh -c

# Describe the sink 将日志输出到控制台
exec-memory-avro.sinks.avro-sink.type = avro
exec-memory-avro.sinks.avro-sink.hostname=master
exec-memory-avro.sinks.avro-sink.port=44444

# Use a channel which buffers events in memory 
exec-memory-avro.channels.memory-channel.type = memory

# Bind the source and sink to the channel
exec-memory-avro.sources.exec-source.channels = memory-channel
exec-memory-avro.sinks.avro-sink.channel = memory-channel



# B服务器上
aver-memory-logger.conf

#name the components on this agent
avro-memory-logger.sources = avro-source
avro-memory-logger.sinks = logger-sink
avro-memory-logger.channels = memory-channel

# Describe/configure the source
avro-memory-logger.sources.avro-source.type = avro
avro-memory-logger.sources.avro-source.bind = master
avro-memory-logger.sources.avro-source.port = 44444

# Describe the sink 将日志输出到控制台
avro-memory-logger.sinks.logger-sink.type = logger

# Use a channel which buffers events in memory
avro-memory-logger.channels.memory-channel.type = memory

# Bind the source and sink to the channel
avro-memory-logger.source.avro-source.channels = memory-channel
avro-memory-logger.sinks.logger-sink.channel = memory-channel

启动

 # 1、先启动logger
flume-ng agent --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/aver-memory-logger.conf --name avro-memory-logger -Dflume.root.logger=INFO,console
 # 2、启动aver-memory-avro
flume-ng agent --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/aver-memory-avro.conf --name exec-memory-avro -Dflume.root.logger=INFO,console

实战三的基本流程:

欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/zaji/5679484.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-12-17
下一篇 2022-12-17

发表评论

登录后才能评论

评论列表(0条)

保存