0%

flinkx之kafka到hive数据的准实时同步

flinkx基于flink实现的数据管道
管道分reader和writer两部分,分别由不同的数据源插件实现

如:KafkaReader,HiveWriter

KafkaReader

  • kafkaReader通过KafkaClient消费数据,解析数据,转化成row

HiveWriter

Reader的实现

reader通过继承2个函数,

  1. InputFormatSourceFunction->RichParallelSourceFunction->AbstractRichFunction:RichFunction
1
2
3
4
5
6
7
/**
* An abstract stub implementation for rich user-defined functions.
* Rich functions have additional methods for initialization ({@link #open(Configuration)}) and
* teardown ({@link #close()}), as well as access to their runtime execution context via
* {@link #getRuntimeContext()}.
*/
@Public
  1. CheckpointedFunction

Writer的实现

file类型的writer

writerRecord()内写入.data临时目录
flink的全局checkpoint触发close后将临时目录内文件复制到实际的数据目录

flinkx的启动参数confProp中指定flink.checkpoint.interval开启checkpoint.

jdbc类型的writer