- 添加依赖
- 基于 Flink 服务提交任务并执行时需要的依赖包
- 启动前注意
- 构建KafkaSource参数实例
- 构建自定义KafkaMQSource
基于 Flink 服务提交任务并执行时需要的依赖包org.apache.flink flink-connector-kafka_2.121.13.2 provided
基于 flink 服务器提交任务前,先上传依赖包到 flink 的 lib 目录下;然后重启 flink 服务,使 jar 进行加载;否则会出现 ClassNoFoundException 的异常。
- flink-connector-kafka_2.12-1.13.2.jar
- kafka-clients-2.4.1.jar
确保 topic 在 kafka 中是真实存在的,否则将会产生如下的执行异常:
- 运行逻辑:先获取kafka中全部的topic list,再进行正则匹配,得到指定的topic list 调试发现,获取kafka全部topic list返回null。然后产生下述异常,此时创建对应的 topic,等待下次任务重启后将可正常运行。
java.lang.RuntimeException: Unable to retrieve any partitions with KafkaTopicsDescriptor: Topic Regex Pattern (WYSXT_47_(.+)_47_other_47_property_47_post) at org.apache.flink.streaming.connectors.kafka.internals.AbstractPartitionDiscoverer.discoverPartitions(AbstractPartitionDiscoverer.java:156) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerbase.open(FlinkKafkaConsumerbase.java:577) at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:34) at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102) at org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:442) at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:582) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.call(StreamTaskActionExecutor.java:100) at org.apache.flink.streaming.runtime.tasks.StreamTask.executeRestore(StreamTask.java:562) at org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:647) at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:537) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:759) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566) at java.lang.Thread.run(Thread.java:748)构建KafkaSource参数实例
public class KafkaSource implements Serializable { private static final long serialVersionUID = 6060562931782343343L; private String bootStrapServers; private String groupId; private String topic; public String getBootStrapServers() { return bootStrapServers; } public String getGroupId() { return groupId; } public String getTopic() { return topic; } public KafkaSource(Object obj) { final JSONObject json = JSONObject.parseObject(obj.toString()); this.bootStrapServers = json.getString("bootStrapServers"); this.groupId = json.getString("groupId"); this.topic = json.getString("topic"); } }构建自定义KafkaMQSource
基于FlinkKafkaConsumer< T > 类实现KafkaSource,其中KafkaDeserializationSchema< T >类型是用于数据反序列化的,可以将数据组装成你想要的方式然后传递出去。
import java.io.Serializable; import java.util.Map; import java.util.Properties; import java.util.concurrent.ConcurrentHashMap; import java.util.regex.Pattern; import org.apache.flink.api.common.typeinfo.TypeHint; import org.apache.flink.api.common.typeinfo.TypeInformation; import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer; import org.apache.flink.streaming.connectors.kafka.KafkaDeserializationSchema; import org.apache.kafka.clients.CommonClientConfigs; import org.apache.kafka.clients.consumer.ConsumerRecord; public class KafkaMessageSource implements Serializable { private static final long serialVersionUID = -1128615689349479275L; private FlinkKafkaConsumer
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)