Lab: Day to Day Data Engineer's Work¶
This lab presents a simple flow for standard activities Data Engineer may do to develop Confluent Flink Solution.
Prerequisites¶
During this lab, you will work on an existing Flink project, therefore clone the repository:
-
Set your configuration file and environment variables as presented in the setup lab
-
Be sure to have cloned the flink project demos git repository:
Create table¶
During the life of the shift left project, data engineers create need table. The tool support creating table with a common structure. As an example we will add a shipment source table:
The create folder will look like:
├── sources
│ └── abc
│ ├── src_shipments
│ │ ├── Makefile
│ │ ├── pipeline_definition.json
│ │ ├── sql-scripts
│ │ │ ├── ddl.src_abc_shipments.sql
│ │ │ └── dml.src_abc_shipments.sql
│ │ ├── tests
│ │ └── tracking.md
Unit test table¶
- Unit tests may be added to facts, views and dimension tables.The tool looks at the SQL content, the join, from and build test data for each sources. Here is an example of unit tests added.
The new added files are:
├── c360
│ └── fct_customer_360_profile
│ ├── Makefile
│ ├── pipeline_definition.json
│ ├── sql-scripts
│ │ ├── ddl.c360_fct_customer_profile.sql
│ │ └── dml.c360_fct_customer_profile.sql
│ ├── tests
│ │ ├── ddl_dim_c360_customer_transactions.sql
│ │ ├── ddl_src_c360_app_usage.sql
│ │ ├── ddl_src_c360_customers.sql
│ │ ├── ddl_src_c360_loyalty_program.sql
│ │ ├── ddl_src_c360_support_ticket.sql
│ │ ├── insert_dim_c360_customer_transactions_1.sql
│ │ ├── insert_src_c360_app_usage_1.sql
│ │ ├── insert_src_c360_customers_1.sql
│ │ ├── insert_src_c360_loyalty_program_1.sql
│ │ ├── insert_src_c360_support_ticket_1.sql
│ │ ├── README.md
│ │ ├── test_definitions.yaml
│ │ └── validate_c360_fct_customer_profile_1.sql
-
If you have a local LLM server it is possible to define environment variables and run the same command with the ai extension so the synthetic data will have coherent data to support joins
-
Tune the data manually to make the test relevant.
- Work on the Validation SQL to be sure, it reports test failure or success.
- Run the unit test
Assess Pipeline¶
- Start from a white page: The pipeline_definition.json files are, per table and local to your local folder. It may possible those files were created in git. But it is recommended to clean your local copy:
Recall that it will work from $PIPELINES folder.
-
Rebuild the metadata with your local work.
-
Verify an execution plan of one of your table. For example for the view of the customer 360 profile we can do:
The result looks somethig like, with the statement name, using naming convention, the current status of the statement, the compute pool on which the statement may run or is running, and the action the tool will take.
--- Ancestors: 9 --- Statement Name Status Compute Pool Action Upgrade Mode Table Name -------------------------------------------------------------------------------------------------------------------------------------------------------- --- dev-usw2-c360-dml-src-c360-app-usage STOPPED lfcp-xvrvmz To run Stateful src_c360_app_usage dev-usw2-c360-dml-src-c360-support-ticket COMPLET lfcp-xvrvmz Skip Stateful src_c360_support_ticket dev-usw2-c360-dml-src-c360-loyalty-program COMPLET lfcp-xvrvmz Skip Stateless src_c360_loyalty_program dev-usw2-c360-dml-src-c360-customers COMPLET lfcp-279od2 Skip Stateful src_c360_customers dev-usw2-c360-dml-src-c360-transactions STOPPED lfcp-xvrvmz To run Stateless src_c360_transactions dev-usw2-c360-dml-src-c360-tx-items STOPPED lfcp-xvrvmz To run Stateful src_c360_tx_items dev-usw2-c360-dml-src-c360-products COMPLET lfcp-xvrvmz Skip Stateless src_c360_products dev-usw2-c360-dml-dim-c360-customer-transactions STOPPED lfcp-x23ggx To run Stateful dim_c360_customer_transactions dev-usw2-c360-dml-c360-fct-customer-profile STOPPED lfcp-xvrvmz To run Stateful c360_fct_customer_profile --- Children to restart --- Statement Name Status Compute Pool Action Upgrade Mode Table Name -------------------------------------------------------------------------------------------------------------------------------------------------------- --- dev-usw2-c360-dml-customer-analytics-c360 STOPPED lfcp-xvrvmz Restart Stateless customer_analytics_c360It is followed to a list of compute pools with their current capacity.
Pool ID Pool Name Current/Max CFU Flink Statement name -------------------------------------------------------------------------------------------------------------------------------------------- lfcp-xvrvmz dev-j9r-pool 0/50 dev-usw2-c360-dml-src-c360-app-usage lfcp-xvrvmz dev-j9r-pool 0/50 dev-usw2-c360-dml-src-c360-support-ticket lfcp-xvrvmz dev-j9r-pool 0/50 dev-usw2-c360-dml-src-c360-loyalty-program lfcp-279od2 dev-src-c360-customers 0/30 dev-usw2-c360-dml-src-c360-customers lfcp-xvrvmz dev-j9r-pool 0/50 dev-usw2-c360-dml-src-c360-transactions lfcp-xvrvmz dev-j9r-pool 0/50 dev-usw2-c360-dml-src-c360-tx-items lfcp-xvrvmz dev-j9r-pool 0/50 dev-usw2-c360-dml-src-c360-products lfcp-x23ggx dev-dim-c360-customer-transactions 0/30 dev-usw2-c360-dml-dim-c360-customer-transactions lfcp-xvrvmz dev-j9r-pool 0/50 dev-usw2-c360-dml-c360-fct-customer-profile lfcp-xvrvmz dev-j9r-pool 0/50 dev-usw2-c360-dml-customer-analytics-c360
Deploy one to many tables¶
to deploy according to the execution plan, the deployment may include a default compute pool id, if a statement has no current pool id assigned to.