Skip to content

Confluent Tableflow

Product blog

Goals

Allow to represent a Kafka topic as a table in Apache Iceberg or Delta Lake format. It is using Schema registry to get the schema definition of the table. It addresses a unified storage view on top of object storage.

Kafka topic is the source of truth of the data.

Value Propositions

  • It is a Cloud service, per region.
  • The data from the topic is moved to object storage in parquet format with Iceberg metadata.
  • Need to bring your own object storage (s3 bucket)
  • Work with private network, using gateway private endpoints to S3. No traffic over public internet.
  • Data refreshness is down to 15mn, default 6 hours. For higher need, it can read from broker directly, at the minute level.
  • Start from the earliest offset.
  • Can copact multiple small files in bigger file.
  • It keeps track of committed osffset in iceberg.
  • Write data as encrypted at source level.

Current limitation

  • Consume only from append log topic, and non compacted topic.
  • DLQ not supported yet
  • Not changing the log type once enabled.
  • No support BYOK clusters
  • Could

Architecture

  • Kafka cluster on Confluent Cloud
  • Tableflow capability
  • S3 access policy and service role

Persisted Data in S3

  • Keep metadata of records in original topic like topic name, timestamp and offset