Skip to content

Quick Start for the demonstration

For developers see the specific instructions for running locally, with code structure and implementation approach descriptions.

Prerequisites

Gather API Keys

At the minimum you need to get the API KEY and SECRETS for the user that will run the terraform, or the confluent CLI, or the shift_left CLI.

The backend uses one file for environment variables: the ./backend/.env.

cp ./backend/.env.example ./backend/.env

Modify the top section of the file if you will create the environment, kafka cluster, schema registry and Flink Compute pool with Terraform.

CLOUD_PROVIDER="aws"
CLOUD_REGION="us-west-2"
ORG_ID="...."
CONFLUENT_CLOUD_API_KEY=....
CONFLUENT_CLOUD_API_SECRET=cflt....

Infrastructure as Code

If you do not have a Confluent Cloud Environment, a Kafka cluster, schema registry and Flink compute pools, you can use the Terraform in the IaC folder. You can also reuse existing resources. We will explain that in a sub section

  • Followin classical steps for terraform:
terraform init
terraform plan
terraform apply --auto-approve

The outputs should look like:

env_display_name = "health-env"
env_id = "env-r..."
flink_compute_pool_id = "lfcp-9....m"
flink_rest_endpoint = "https://flink.us-west-2.aws.confluent.cloud"
flink_statements_ddl_raw = {}
flink_statements_ddl_rmd = {}
flink_statements_dml_raw = {}
flink_statements_dml_rmd = {}
kafka_bootstrap_endpoint = "SASL_SSL://pkc-......us-west-2.aws.confluent.cloud:9092"
kafka_cluster_display_name = "health-kafka"
kafka_cluster_id = "lkc-...."
kafka_rest_endpoint = "https://pkc-.....us-west-2.aws.confluent.cloud:443"
schema_registry_endpoint = "https://psrc-......us-west-2.aws.confluent.cloud"
schema_registry_id = "lsrc-...."
  • Adding a command to get environment variables (to see them)

     terraform output -json backend_env
    

    The response is a json document will all the envronment variables

    {"FLINK_API_KEY":"....",
    "FLINK_API_SECRET":"cflt.....",
    "FLINK_COMPUTE_POOL_ID":"lfcp-....",
    "FLINK_REST_ENDPOINT":"https://flink......confluent.cloud",
    "KAFKA_API_KEY":"....",
    "KAFKA_API_SECRET":"cflt.....",
    "KAFKA_BOOTSTRAP_SERVERS":"SASL_SSL://pkc-......confluent.cloud:9092",
    "KAFKA_CLUSTER_ID":"lkc-...",
    "KAFKA_REST_ENDPOINT":"https://pkc-....confluent.cloud:443",
    "KAFKA_SASL_PASSWORD":"cfltVO...A",
    "KAFKA_SASL_USERNAME":"....",
    "PRINCIPAL_ID":"sa-...","SCHEMA_REGISTRY_BASIC_AUTH_USER_INFO":"....",
    "SCHEMA_REGISTRY_URL":"https://psrc-....confluent.cloud"}
    
  • modify the backend/.env with the created API KEYs and SECRETs

    cd ./scripts
     ./update_backend_env_from_terraform.sh  
    

    The backend/.env should get the new environment variables settings.

Reuse existing Confluent Cloud Resources

In case you want to reuse an existing Confluent environment, Kafka cluster, schema registry and keys and secrets...

To Be Completed

Create CDC Topic

  • Login to confluent using cli:

    confluent login
    

  • Run shell to create precription topic

    cd connect
    ./create-topics.sh
    cd ..
    

Define environment variables in current shell

source set_env_var

If you plan to use shift_left utils add the following step:

source set_sl_env

Local Execution Of the Demo Components

docker compose up -d

Access to the demonstration web application

Using Shift Left CLI

  • Prepare the metadata:
    source set_sl_env 
    # Build the table inventory
    shift_left table build-inventory
    # build the pipelines metadata
    shift_left pipeline delete-all-metadata
    shift_left pipeline build-all-metadata
    
  • Change some setting for current source topics

    shift_left pipeline prepare $PIPELINES/rmd/alter_tables.sql
    

  • Assess one of the leaf table execution path:

     shift_left pipeline build-execution-plan --table-name hc_fct_drift_evts --compute-pool-id $FLINK_COMPUTE_POOL_ID 
    

  • Deploy a full pipeline

    shift_left  pipeline deploy --table-name hc_fct_drift_evts --compute-pool-id $FLINK_COMPUTE_POOL_ID
    

  • Undeploy: the best approach is to deploy per data products:

    shift_left  pipeline undeploy --product-name rmd --compute-pool-id $FLINK_COMPUTE_POOL_ID
    #
    shift_left/cli.py  pipeline undeploy --product-name raw --compute-pool-id $FLINK_COMPUTE_POOL_ID
    

Using Terraform