Flink Getting Started Guide¶
Updates
- Created 2018
- Updated 2/14/2025 - improve note, on k8s deployment and get simple demo reference, review done.
- 03/30/25: converged the notes and update referenced links
This guide covers four different approaches to deploy and run Apache Flink:
- Local Binary Installation
- Docker-based Deployment
- Kubernetes Deployment: Colima or minicube
- Confluent Cloud Managed Service
Prerequisites¶
Before getting started, ensure you have:
- Java 11 or higher installed
- Docker Engine and Docker CLI (for Docker and Kubernetes deployments)
- kubectl and Helm (for Kubernetes deployment)
- Confluent Cloud account (for Confluent Cloud deployment)
- Git (to clone this repository)
1. Local Binary Installation¶
This approach is ideal for development and testing on a single machine.
Installation Steps¶
-
Download and extract Flink binary
-
Set environment variables:
-
Start the Flink cluster:
-
Access the Web UI at http://localhost:8081
-
Stop th cluster:
Running Applications¶
-
Submit a sample job:
-
Start SQL Client:
-
[Optional] Start SQL Gateway for concurrent SQL queries:
See product documentation for different examples. To do some Python table API demonstrations see this chapter.
Troubleshooting¶
- If port 8081 is already in use, modify
conf/flink-conf.yaml
- Check logs in
$FLINK_HOME/log
directory - Ensure Java 11+ is installed and JAVA_HOME is set correctly
2. Docker-based Deployment¶
This approach provides containerized deployment using Docker Compose.
Prerequisites¶
- Docker Engine
- Docker Compose
- Git (to clone this repository)
Quick Start¶
-
Build custom Flink image (if needed):
-
Start Flink session cluster:
Docker Compose with Kafka¶
To run Flink with Kafka:
Customization¶
- Modify
deployment/custom-flink-image/Dockerfile
to add required connectors - Update
deployment/docker/flink-oss-docker-compose.yaml
for configuration changes
During development, we can use docker-compose to start a simple Flink session
cluster or a standalone job manager to execute one unique job, which has the application jar mounted inside the docker image. We can use this same environment to do SQL based Flink apps.
As Task manager will execute the job, it is important that the container running the flink code has access to jars needed to connect to external sources like Kafka or other tools like FlinkFaker. Therefore, in deployment/custom-flink-image
, there is a Dockerfile to get the needed jars to build a custom Flink image that may be used for Taskmanager and SQL client. Always update the jar version with new Flink version.
Docker hub and maven links
3. Kubernetes Deployment¶
This approach provides scalable, production-ready deployment using Kubernetes.
Prerequisites¶
- Kubernetes cluster (local or cloud)
- kubectl
- Helm
- Optional Colima for local Kubernetes
Deployment Steps¶
-
Install Flink Operator:
-
Deploy Flink Application:
Confluent Platform on Kubernetes¶
For Confluent Platform deployment:
-
Install Confluent Operator:
-
Deploy Kafka cluster:
-
Deploy Flink applications
See Kubernetes deployment chapter for detailed instructions. And Confluent operator documentation.
4. Confluent Cloud Deployment¶
This approach provides a fully managed Flink service.
Prerequisites¶
- Confluent Cloud account
- Confluent CLI installed
- Environment configured
Getting Started¶
-
Create a Flink compute pool:
-
Start SQL client:
-
Submit SQL statements:
CREATE TABLE my_table ( id INT, name STRING ) WITH ( 'connector' = 'kafka', 'topic' = 'my-topic', 'properties.bootstrap.servers' = 'pkc-xxxxx.region.provider.confluent.cloud:9092', 'properties.security.protocol' = 'SASL_SSL', 'properties.sasl.mechanism' = 'PLAIN', 'properties.sasl.jaas.config' = 'org.apache.kafka.common.security.plain.PlainLoginModule required username="<API_KEY>" password="<API_SECRET>";', 'format' = 'json' );
See Confluent Cloud Flink documentation for more details.
Choosing the Right Deployment Approach¶
Approach | Use Case | Pros | Cons |
---|---|---|---|
Local Binary | Development, Testing | Simple setup, Fast iteration | Limited scalability, or manual configuration and maintenance on distributed computers. |
Docker | Development, Testing | Containerized, Reproducible | Manual orchestration |
Kubernetes | Production | Scalable, Production-ready | Complex setup |
Confluent Cloud | Production | Fully managed, No ops | Vendor Control Plane |