Flink Getting Started Guide¶

Updates

Created 2018
Updated 2/14/2025 - improve note, on k8s deployment and get simple demo reference, review done.
03/30/25: converged the notes and update referenced links
09/25: Review - simplications.

There are four different approaches to deploy and run Apache Flink / Confluent Flink:

Local Binary Installation
Docker-based local deployment
Kubernetes Deployment Colima, AKS, EKS, GKS, King or minicube
Confluent Cloud Managed Service

Pre-requisites¶

Before getting started, ensure you have some of the following dependant components:

Java 11 or higher installed (OpenJDK) for developing Java based solution.
Docker Engine and Docker CLI (for Docker and Kubernetes deployments)
kubectl and Helm (for Kubernetes deployment)
Confluent Cloud account (for Confluent Cloud deployment, and Flink SQL development)
Confluent cli installed for both Confluent Cloud and Confluent Platform
Git clone this repository, to access code and deployment manifests and scripts

1. Open Source Apache Flink Local Binary Installation¶

This approach is ideal for development and testing on a single machine.

Installation Steps¶

Download and extract Flink binary

# Using the provided script
./deployment/product-tar/install-local.sh

# Or manually
curl https://dlcdn.apache.org/flink/flink-1.20.1/flink-1.20.1-bin-scala_2.12.tgz --output flink-1.20.1-bin-scala_2.12.tgz
tar -xzf flink-1.20.1-bin-scala_2.12.tgz

Download and extract Kafka binary See download versions

curl https://downloads.apache.org/kafka/3.9.0/kafka_2.13-3.9.0.tgz  --output kafka_2.13-3.9.0.tgz 
tar -xvf kafka_2.13-3.9.0.tgz
cd kafka_2.13-3.9.0
KAFKA_CLUSTER_ID="$(bin/kafka-storage.sh random-uuid)"
bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c config/kraft/server.properties

Set environment variables:

export FLINK_HOME=$(pwd)/flink-1.20.1
export KAFKA_HOME=$(pwd)/kafka_2.13-3.9.0
export PATH=$PATH:$FLINK_HOME/bin:$KAFKA_HOME/bin

Start the Flink cluster:
```
$FLINK_HOME/bin/start-cluster.sh
```
Access the Web UI at http://localhost:8081

Submit a job (Java application)

./bin/flink run ./examples/streaming/TopSpeedWindowing.jar
./bin/flink list
./bin/flink cancel <id>

Download needed SQL connector for Kafka

cd flink-1.20.1
mkdir sql-lib
cd sql-lib
curl https://repo.maven.apache.org/maven2/org/apache/flink/flink-sql-connector-kafka/3.4.0-1.20/flink-sql-connector-kafka-3.4.0-1.20.jar --output flink-sql-connector-kafka-3.4.0-1.20.jar

Stop the cluster:
```
$FLINK_HOME/bin/stop-cluster.sh
```

Running a simple SQL application¶

Start Kafka cluster and create topics

$KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/kraft/server.properties
$KAFKA_HOME/bin/kafka-topics.sh --create --topic flink-input --bootstrap-server localhost:9092
$KAFKA_HOME/bin/kafka-topics.sh --create --topic message-count --bootstrap-server localhost:9092

Use the Flink SQL Shell:

$FLINK_HOME/bin/sql-client.sh --library $FLINK_HOME/sql-lib

Create a table using Kafka connector:

CREATE TABLE flinkInput (
   `raw` STRING,
   `ts` TIMESTAMP(3) METADATA FROM 'timestamp'
) WITH (
   'connector' = 'kafka',
   'topic' = 'flink-input',
   'properties.bootstrap.servers' = 'localhost:9092',
   'properties.group.id' = 'j9rGroup',
   'scan.startup.mode' = 'earliest-offset',
   'format' = 'raw'
);

Create an output table in a debezium format so we can see the before and after data:

CREATE TABLE msgCount (
   `count` BIGINT NOT NULL
) WITH (
   'connector' = 'kafka',
   'topic' = 'message-count',
   'properties.bootstrap.servers' = 'localhost:9092',
   'properties.group.id' = 'j9rGroup',
   'scan.startup.mode' = 'earliest-offset',
   'format' = 'debezium-json'
);

Make the simplest flink processing by counting the messages:

INSERT INTO msgCount SELECT COUNT(*) as `count` FROM flinkInput;

The result will look like:

[INFO] Submitting SQL update statement to the cluster...
[INFO] SQL update statement has been successfully submitted to the cluster:
Job ID: 2be58d7f7f67c5362618b607da8265d7

Start a producer in one terminal

 bin/kafka-console-producer.sh --topic flink-input --bootstrap-server localhost:9092

Verify result in a second terminal

bin/kafka-console-consumer.sh -topic message-count  --bootstrap-server localhost:9092

The results will be a list of debezium records like

{"before":null,"after":{"count":1},"op":"c"}
{"before":{"count":1},"after":null,"op":"d"}
{"before":null,"after":{"count":2},"op":"c"}
{"before":{"count":2},"after":null,"op":"d"}
{"before":null,"after":{"count":3},"op":"c"}
{"before":{"count":3},"after":null,"op":"d"}
{"before":null,"after":{"count":4},"op":"c"}

[Optional] Start SQL Gateway for concurrent SQL queries:

$FLINK_HOME/bin/sql-gateway.sh start -Dsql-gateway.endpoint.rest.address=localhost

See product documentation for different examples. To do some Python Table API demonstrations see this chapter.

Troubleshooting¶

If port 8081 is already in use, modify $FLINK_HOME/conf/flink-conf.yaml
Check logs in $FLINK_HOME/log directory
Ensure Java 11+ is installed and JAVA_HOME is set correctly

2. Docker-based Deployment¶

This approach provides containerized deployment using Docker Compose.

Quick Start¶

Build a custom Apache Flink image with your own connectors. Verify current docker image tag then use the Dockerfile:
```
cd deployment/custom-flink-image
docker build -t jbcodeforce/myflink .
```

Start Flink session cluster:

cd deployment/docker
docker compose up -d

Docker Compose with Kafka¶

To run Flink with Kafka:

cd deployment/docker
docker compose -f kafka-docker-compose.yaml up -d

Customization¶

Modify deployment/custom-flink-image/Dockerfile to add required connectors
Update deployment/docker/flink-oss-docker-compose.yaml for configuration changes

During development, we can use docker-compose to start a simple Flink session cluster or a standalone job manager to execute one unique job, which has the application jar mounted inside the docker image. We can use this same environment to do SQL based Flink apps.

As Task manager will execute the job, it is important that the container running the flink code has access to the jars needed to connect to external sources like Kafka or other tools like FlinkFaker. Therefore, in deployment/custom-flink-image, there is a Dockerfile to get the needed jars to build a custom Flink image that may be used for Taskmanager and SQL client. Always update the jar version with new Flink version.

Docker hub and maven links

3. Kubernetes Deployment¶

This approach provides scalable, production-ready deployment using Kubernetes.

Apache Flink¶

See the K8S deployment deeper dive chapter and the lab readme for details.

Confluent Platform Manager for Flink on Kubernetes¶

See Kubernetes deployment chapter for detailed instructions, and the deployment folder and readme. See also the Confluent operator documentation, submit Flink SQL Statement with Confluent Manager for Apache Flink:

4. Confluent Cloud Deployment¶

This approach provides a fully managed Flink service and very easy to get started quickly without managing Flink clusters or Kafka Clusters.

Create a Flink compute pool:

confluent flink compute-pool create my-pool --cloud aws --region us-west-2

Start SQL client:
```
confluent flink shell
```

Submit SQL statements:

CREATE TABLE my_table (
  id INT,
  name STRING
) WITH (
  'connector' = 'kafka',
  'topic' = 'my-topic',
  'properties.bootstrap.servers' = 'pkc-xxxxx.region.provider.confluent.cloud:9092',
  'properties.security.protocol' = 'SASL_SSL',
  'properties.sasl.mechanism' = 'PLAIN',
  'properties.sasl.jaas.config' = 'org.apache.kafka.common.security.plain.PlainLoginModule required username="<API_KEY>" password="<API_SECRET>";',
  'format' = 'json'
);

See Confluent Cloud Flink documentation for more details.

Explore the Shift Left project, your dedicated CLI for scaling and organizing Confluent Cloud Flink projects with an opinionated, streamlined approach.

Choosing the Right Deployment Approach¶

Approach	Use Case	Pros	Cons
Local Binary	Development, Testing	Simple setup, Fast iteration	Limited scalability, or manual configuration and maintenance on distributed computers.
Docker	Development, Testing	Containerized, Reproducible	Manual orchestration
Kubernetes	Production	Scalable, Production-ready	Complex setup
Confluent Cloud	Production	Fully managed, No ops	Vendor Control Plane

Additional Resources¶

Apache Flink Documentation
Confluent Cloud Documentation
Flink Kubernetes Operator
Docker Hub Flink Images
Shift Left project to manage Flink project at scale.

<< PREVIOUS: Key Concepts >> NEXT: Flink SQL >>