Flink Getting Started Guide¶
Updates
- Created 2018
- Updated 2/14/2025 - improve note, on k8s deployment and get simple demo reference, review done.
- 03/30/25: converged the notes and update referenced links
There are four different approaches to deploy and run Apache Flink / Confluent Flink:
- Local Binary Installation
- Docker-based Deployment
- Kubernetes Deployment: Colima, AKS, EKS, GKS or minicube
- Confluent Cloud Managed Service
Prerequisites¶
Before getting started, ensure you have:
- Java 11 or higher installed (OpenJDK)
- Docker Engine and Docker CLI (for Docker and Kubernetes deployments)
- kubectl and Helm (for Kubernetes deployment)
- Confluent Cloud account (for Confluent Cloud deployment)
- Git (to clone this repository)
- Confluent cli installed
1. Open Source Apache Flink Local Binary Installation¶
This approach is ideal for development and testing on a single machine.
Installation Steps¶
- Download and extract Flink binary
-
Download and extract Kafka binary See download versions
-
Set environment variables:
-
Start the Flink cluster:
-
Access the Web UI at http://localhost:8081
-
Submit a job (Java application)
-
Download needed SQL connector for Kafka
-
Stop the cluster:
Running a simple SQL application¶
-
Start Kafka cluster and create topics
-
Use the Flink SQL Shell:
- Create a table using Kafka connector:
- Create an output table in a debizium format so we can see the before and after state:
- Make the simplest flink processing by counting the messages: The result will look like:
- Start a producer in one terminal
- Verify result in a second terminal The results will be a list of debezium records like
{"before":null,"after":{"count":1},"op":"c"} {"before":{"count":1},"after":null,"op":"d"} {"before":null,"after":{"count":2},"op":"c"} {"before":{"count":2},"after":null,"op":"d"} {"before":null,"after":{"count":3},"op":"c"} {"before":{"count":3},"after":null,"op":"d"} {"before":null,"after":{"count":4},"op":"c"}
-
[Optional] Start SQL Gateway for concurrent SQL queries:
See product documentation for different examples. To do some Python Table API demonstrations see this chapter.
Troubleshooting¶
- If port 8081 is already in use, modify
$FLINK_HOME/conf/flink-conf.yaml
- Check logs in
$FLINK_HOME/log
directory - Ensure Java 11+ is installed and JAVA_HOME is set correctly
2. Docker-based Deployment¶
This approach provides containerized deployment using Docker Compose.
Prerequisites¶
- Docker Engine
- Docker Compose
Quick Start¶
-
Build a custom Apache Flink image with your own connectors. Verify current docker image tag then use the Dockerfile:
-
Start Flink session cluster:
Docker Compose with Kafka¶
To run Flink with Kafka:
Customization¶
- Modify
deployment/custom-flink-image/Dockerfile
to add required connectors - Update
deployment/docker/flink-oss-docker-compose.yaml
for configuration changes
During development, we can use docker-compose to start a simple Flink session
cluster or a standalone job manager to execute one unique job, which has the application jar mounted inside the docker image. We can use this same environment to do SQL based Flink apps.
As Task manager will execute the job, it is important that the container running the flink code has access to the jars needed to connect to external sources like Kafka or other tools like FlinkFaker. Therefore, in deployment/custom-flink-image
, there is a Dockerfile to get the needed jars to build a custom Flink image that may be used for Taskmanager and SQL client. Always update the jar version with new Flink version.
Docker hub and maven links
3. Kubernetes Deployment¶
This approach provides scalable, production-ready deployment using Kubernetes. See the K8S deployment deeper dive chapter and the lab readme for all command details.
The following are summary of basic steps, but Makefiles are available in the deployment/k8s to simplify deployment.
- Deploy a State Machine example application to validate the deployment.
- Verify Flink UI access
-
Using the SQL client
-
Undeploy
Confluent Platform Manager for Flink on Kubernetes¶
See Kubernetes deployment chapter for detailed instructions. And Confluent operator documentation, submit Flink SQL Statement with Confluent Manager for Apache Flink
For Confluent Platform deployment:
-
Install Confluent Flink Operator (see CMF product get started):
-
Deploy Confluent Platform Kafka cluster:
-
Deploy Confluent Manager for Flink (cmf)
-
Deploy Flink applications
-
Work with SQL
4. Confluent Cloud Deployment¶
This approach provides a fully managed Flink service.
Prerequisites¶
- Confluent Cloud account
- Confluent CLI installed
- Environment configured
Getting Started¶
-
Create a Flink compute pool:
-
Start SQL client:
-
Submit SQL statements:
CREATE TABLE my_table ( id INT, name STRING ) WITH ( 'connector' = 'kafka', 'topic' = 'my-topic', 'properties.bootstrap.servers' = 'pkc-xxxxx.region.provider.confluent.cloud:9092', 'properties.security.protocol' = 'SASL_SSL', 'properties.sasl.mechanism' = 'PLAIN', 'properties.sasl.jaas.config' = 'org.apache.kafka.common.security.plain.PlainLoginModule required username="<API_KEY>" password="<API_SECRET>";', 'format' = 'json' );
See Confluent Cloud Flink documentation for more details.
See the Shift Left project to manage bigger Flink project with a dedicated CLI.
Choosing the Right Deployment Approach¶
Approach | Use Case | Pros | Cons |
---|---|---|---|
Local Binary | Development, Testing | Simple setup, Fast iteration | Limited scalability, or manual configuration and maintenance on distributed computers. |
Docker | Development, Testing | Containerized, Reproducible | Manual orchestration |
Kubernetes | Production | Scalable, Production-ready | Complex setup |
Confluent Cloud | Production | Fully managed, No ops | Vendor Control Plane |