Environment Setup - Deeper dive¶
Versions
Created January 2025 - Updated 03/21/25: setup guide separation between user of the CLI and developer of this project. Update 08/09: config file explanation and how to get the parameters
The new setup lab addresses how to setup the shift_left cli and how to validate the installation.
This chapter addresses configuration review and tuning.
The config.yaml file¶
The configuration file config.yaml is used intensively to tune the shift_left per environment and will be referenced by the environment variables: CONFIG_FILE. You should have different config.yaml for the different kafka cluster, schema registry and Flink environment.
-
Copy the
config_tmpl.yamltemplate file to keep some important parameters for the CLI. -
Modify the
config.yamlwith values from your Confluent Cloud settings. See the tabs below for the different sections of this file:
- Get the Kakfa cluster URL using the
Confluent Console > Cluster Settingspage. The URL has the cluster id and a second id that is used for the RESP API.
- Get environment ID from the
Environment detailsin the Confluen Console. The cloud provider and region.
confluent_cloud:
environment_id: env-20xxxx
region: us-west-2
provider: aws
organization_id: 5329.....96
organization_id is defined under the user account > Organization settings -
Flink settings are per environment. Get the URL endpoint by going to
Environments > one_of_the_env > Flink > Endpoints, copy the private or public endpoint -
The compute pool id is used as default for running Flink query.
- Catalog name is the name of the environment and database name is the name of the Kafka cluster
The app section defines a set of capabilities to tune the cli.
app:
accepted_common_products: ['common', 'seeds']
sql_content_modifier: shift_left.core.utils.table_worker.ReplaceEnvInSqlContent
dml_naming_convention_modifier: shift_left.core.utils.naming_convention.DmlNameModifier
compute_pool_naming_convention_modifier: shift_left.core.utils.naming_convention.ComputePoolNameModifier
data_limit_where_condition : rf"where tenant_id in ( SELECT tenant_id FROM tenant_filter_pipeline WHERE product = {product_name})"
data_limit_replace_from_reg_ex: r"\s*select\s+\*\s+from\s+final\s*;?"
data_limit_table_type: source
data_limit_column_name_to_select_from: tenant_id
post_fix_unit_test: _ut
post_fix_integration_test: _it
post_fix_unit_test, post_fix_integration_testare used to append the given string to table name during unit testing and integration test respectively.- The
data_limit_replace_from_reg_ex, data_limit_table_type, data_limit_column_name_to_select_fromare used to add data filtering to all the source tables based on one column name to filter. The regex specifies to file theselect * from finalwhich is the last string in most Flink statements using CTEs implementation. sql_content_modifierspecifies the custom class to use to do some SQL content modification depending of the target environment. This is a way to extend the CLI logic to specific usage.
Configuration File Setup¶
-
Update the content of the config.yaml to reflect your Confluent Cloud environment. (For the commands used for migration, you do not need Kafka settings.)
-
Set the following environment variables before using the tool. This can be done by:
Modify the CONFIG_FILE, FLINK_PROJECT, SRC_FOLDER, SL_LLM_* variables
-
Source it:
-
Validate config.yaml
Security access
The config.yaml file is ignored in Git. So having the keys in this file is not a major concern, as it is used by the developers only. But it may be possible, in the future, to access secrets using a Key manager API. This could be a future enhancement.
Environment variables¶
This document explains how to use environment variables to securely manage API keys and secrets instead of storing them in config.yaml files.
The Shift Left utility now supports environment variables for sensitive configuration values. Environment variables take precedence over config.yaml values, allowing you to:
- Keep sensitive data out of configuration files
- Use different credentials for different environments
- Securely manage secrets in CI/CD pipelines
- Follow security best practices
Kafka Section¶
| Environment Variable | Config Path | Description |
|---|---|---|
SL_KAFKA_API_KEY | kafka.api_key | Kafka API key |
SL_KAFKA_API_SECRET | kafka.api_secret | Kafka API secret |
Confluent Cloud Section¶
| Environment Variable | Config Path | Description |
|---|---|---|
SL_CONFLUENT_CLOUD_API_KEY | confluent_cloud.api_key | Confluent Cloud API key |
SL_CONFLUENT_CLOUD_API_SECRET | confluent_cloud.api_secret | Confluent Cloud API secret |
Flink Section¶
| Environment Variable | Config Path | Description |
|---|---|---|
SL_FLINK_API_KEY | flink.api_key | Flink API key |
SL_FLINK_API_SECRET | flink.api_secret | Flink API secret |
Priority Order¶
The setting will use the following order:
- Environment Variables (highest priority)
- Config.yaml values (fallback)
- Default values set in the code
If an environment variable is set, it will override the corresponding value in config.yaml. If a value is set in config.yaml it will be override the default value.
Setting Environment Variables¶
Bash/Zsh¶
export SL_KAFKA_API_KEY="your-kafka-api-key"
export SL_KAFKA_API_SECRET="your-kafka-api-secret"
export SL_FLINK_API_KEY="your-flink-api-key"
export SL_FLINK_API_SECRET="your-flink-api-secret"
export SL_CONFLUENT_CLOUD_API_KEY="your-confluent-cloud-api-key"
export SL_CONFLUENT_CLOUD_API_SECRET="your-confluent-cloud-api-secret"
Using .env File¶
Create a .env file (don't commit this to version control):
SL_KAFKA_API_KEY=your-kafka-api-key
SL_KAFKA_API_SECRET=your-kafka-api-secret
SL_FLINK_API_KEY=your-flink-api-key
SL_FLINK_API_SECRET=your-flink-api-secret
SL_CONFLUENT_CLOUD_API_KEY=your-confluent-cloud-api-key
SL_CONFLUENT_CLOUD_API_SECRET=your-confluent-cloud-api-secret
Then load it before running your application:
Never commit secrets to version control: Use .gitignore to exclude .env files
Common Issues¶
- "Missing environment variables" error
- Check that environment variable names are correct (case-sensitive)
- Verify that variables are exported in your shell
-
Use `env | grep SL to see what's set
-
To see all supported environment variables, you can call the help function in Python: