Flink Getting Started Guide¶
Updates
- Created 2018
- Updated 2/14/2025 - improve note, on k8s deployment and get simple demo reference, review done.
- 03/30/25: converged the notes and update referenced links
This guide covers four different approaches to deploy and run Apache Flink:
- Local Binary Installation
- Docker-based Deployment
- Kubernetes Deployment: Colima or minicube
- Confluent Cloud Managed Service
Prerequisites¶
Before getting started, ensure you have:
- Java 11 or higher installed (OpenJDK)
- Docker Engine and Docker CLI (for Docker and Kubernetes deployments)
- kubectl and Helm (for Kubernetes deployment)
- Confluent Cloud account (for Confluent Cloud deployment)
- Git (to clone this repository)
- Confluent cli installed
1. Local Binary Installation¶
This approach is ideal for development and testing on a single machine.
Installation Steps¶
- Download and extract Flink binary
-
Download and extract Kafka binary See download versions
-
Set environment variables:
-
Start the Flink cluster:
-
Access the Web UI at http://localhost:8081
-
Submit a job
-
Download needed SQL connector for Kafka
-
Stop th cluster:
Running a simple SQL application¶
- Submit a sample job:
-
Start Kafka cluster and create topics
-
Use the Flink SQL Shell:
- Create a table using Kafka connector:
- Create an output table in a debizium format so we can see the before and after state:
- Make the simplest flink processing by counting the messages: The result will look like:
- Start a producer in one terminal
- Verify result in a second terminal The results will be a list of debezium records like
{"before":null,"after":{"count":1},"op":"c"} {"before":{"count":1},"after":null,"op":"d"} {"before":null,"after":{"count":2},"op":"c"} {"before":{"count":2},"after":null,"op":"d"} {"before":null,"after":{"count":3},"op":"c"} {"before":{"count":3},"after":null,"op":"d"} {"before":null,"after":{"count":4},"op":"c"}
-
[Optional] Start SQL Gateway for concurrent SQL queries:
See product documentation for different examples. To do some Python table API demonstrations see this chapter.
Troubleshooting¶
- If port 8081 is already in use, modify
$FLINK_HOME/conf/flink-conf.yaml
- Check logs in
$FLINK_HOME/log
directory - Ensure Java 11+ is installed and JAVA_HOME is set correctly
2. Docker-based Deployment¶
This approach provides containerized deployment using Docker Compose.
Prerequisites¶
- Docker Engine
- Docker Compose
Quick Start¶
-
Build custom Flink image (if needed):
-
Start Flink session cluster:
Docker Compose with Kafka¶
To run Flink with Kafka:
Customization¶
- Modify
deployment/custom-flink-image/Dockerfile
to add required connectors - Update
deployment/docker/flink-oss-docker-compose.yaml
for configuration changes
During development, we can use docker-compose to start a simple Flink session
cluster or a standalone job manager to execute one unique job, which has the application jar mounted inside the docker image. We can use this same environment to do SQL based Flink apps.
As Task manager will execute the job, it is important that the container running the flink code has access to jars needed to connect to external sources like Kafka or other tools like FlinkFaker. Therefore, in deployment/custom-flink-image
, there is a Dockerfile to get the needed jars to build a custom Flink image that may be used for Taskmanager and SQL client. Always update the jar version with new Flink version.
Docker hub and maven links
3. Kubernetes Deployment¶
This approach provides scalable, production-ready deployment using Kubernetes. See the K8S deployment deeper dive chapter and the lab readme for all command details.
- Deploy a State Machine example application to validate the deployment.
- Verify Flink UI access
- Undeploy
Confluent Platform on Kubernetes¶
For Confluent Platform deployment:
-
Install Confluent Flink Operator:
-
Deploy Confluent Platform Kafka cluster:
-
Deploy Flink applications
See Kubernetes deployment chapter for detailed instructions. And Confluent operator documentation.
4. Confluent Cloud Deployment¶
This approach provides a fully managed Flink service.
Prerequisites¶
- Confluent Cloud account
- Confluent CLI installed
- Environment configured
Getting Started¶
-
Create a Flink compute pool:
-
Start SQL client:
-
Submit SQL statements:
CREATE TABLE my_table ( id INT, name STRING ) WITH ( 'connector' = 'kafka', 'topic' = 'my-topic', 'properties.bootstrap.servers' = 'pkc-xxxxx.region.provider.confluent.cloud:9092', 'properties.security.protocol' = 'SASL_SSL', 'properties.sasl.mechanism' = 'PLAIN', 'properties.sasl.jaas.config' = 'org.apache.kafka.common.security.plain.PlainLoginModule required username="<API_KEY>" password="<API_SECRET>";', 'format' = 'json' );
See Confluent Cloud Flink documentation for more details.
See the Shift Left project to manage bigger Flink project with a dedicated CLI.
Choosing the Right Deployment Approach¶
Approach | Use Case | Pros | Cons |
---|---|---|---|
Local Binary | Development, Testing | Simple setup, Fast iteration | Limited scalability, or manual configuration and maintenance on distributed computers. |
Docker | Development, Testing | Containerized, Reproducible | Manual orchestration |
Kubernetes | Production | Scalable, Production-ready | Complex setup |
Confluent Cloud | Production | Fully managed, No ops | Vendor Control Plane |
Application deployment¶
For production it is recommended to deploy in application mode, so packaging SQL, python or java application in a jar.