Skip to content

Process Table Function

Apache Flink PTF and Confluent Cloud PTFs for flink SQL, deployed as UDF. It supports N rows to M rows semantics.

Features

  • Declare and implement State object, that will be persisted by flink, by partition key.
  • The TableAPI or SQL most powerful function API. Supports Stateless or Stateful processing.
  • Access to Flink’s managed state, event-time and timer services, and underlying table changelogs

    public static class CountState {
        public int count = 0;
    }
    
    public void eval(
        @StateHint CountState state,
        @ArgumentHint(SET_SEMANTIC_TABLE) Row input
    ) {
        state.count++;
        ...
    

    The increment is per partition key. Once deployed it is used as:

    SELECT *
    FROM EventCounter(
        input => TABLE examples.marketplace.clicks PARTITION BY user_id,
        uid => 'event-counter-v1'
    );
    

Use Cases

PTFs unlock use cases that can’t be expressed in a declarable way with either SQL or the Table API implementations. It serves a similar purposes compared to the ProcessFunction in Apache Flink’s Datastream API, giving primitives for handling the most common building blocks for stateful processing applications: events, state and timers.

  • Apply transformations on each row of a table.
  • Logically partition the table into distinct sets and apply transformations per set.
  • Store seen events for repeated access.
  • Continue the processing at a later point in time enabling waiting, synchronization, or timeouts.
  • Buffer and aggregate events using complex state machines or rule-based conditional logic.

Sources