Command Summary¶

This section presents a quick summary of the tools used for migration. Be sure to have set environment variables: SRC_FOLDER, STAGING and CONFIG_FILE:

export SRC_FOLDER=../../your-src-dbt-folder/models
export STAGING=../../flink-project/staging
export CONFIG_FILE=../../flink-project/config.yaml

Get parent pipeline¶

To get the parent hierarchy for a fact or dimension table, use the pipeline_helper.py tool. The output will report the dependent tables up to the sources from the inventory of sql files. This inventory will be built from the source folder specified with the -i or --inventory argument

Example to search in the dbt or source project

python pipeline_helper.py -f $SRC_FOLDER/facts/fct_users.sql -i $SRC_FOLDER

Example of searching the pipeline of a migrated file from the migrated content

python pipeline_helper.py -f $STAGING/../pipelines/facts/fct_users.sql -i $STAGING/../pipelines

same but for the staged content

python pipeline_helper.py -f $STAGING/../pipelines/facts/fct_users.sql -i $STAGING

See this section for details

Get the table using another table¶

python find_table_user.py -t table_name -r root_folder_to_search_in
# example
python find_table_user.py -t users -r $STAGING

Process a fact or dimenstion table¶

Generate Flink SQLs from one Fact or Dimension table using a recurcive processing up to the source tables. The SQL created statements are saved into the staging temporary folder

python process_src_tables.py -f $SRC_FOLDER/facts/fct_users.sql -o $STAGING/app -pd

Process all the source tables¶

For all tables in the sources folder, create the matching Flink SQL DDL and DMLs into a temporary folder. Once generated the deduplication logic can be finalized manually to the final pipelines folder. This approach may not be needed if you use the process parent hierarchy option.

python process_src_tables.py -f $SRC_FOLDER/sources -o $STAGING/sources

Create a sink table structure¶

The goal of this tool is to create a folder structure to start migrating SQL manually:

python  create_sink_structure.py -t fct_user -o $STAGING/fct_user

Update the source sql to do some update¶

The typical update will be the source table name to be used as it may change per environment. So this tool can help do a global change. Another use case is to limit the data to use for developement to a spesific set of record, so a where statement can be done.

python change_src_sql.py -s <source_folder> -w "where condition as string"

Others - Need to be updated¶

Once the Source DDL is executed successfully, generate test data, with 5 records, and send them to the matching topic.

# -o is for the output file to get the json array of json objects to be send to the topic, -t is for table name, and -n is for the number of records to create
python generate_data_for_table.py -o data.json -t sys_user_raw -n 5

Send the test data to the target topic

# -t for the topic name and -s source for prtal 
python kafka_avro_producer.py -t portal_role_raw -s ../pipelines/sources/portal_role