Customer 360 Analytics¶
How to consume this content¶
The project is organized in two folds:
- a batch processing using Apache Sparks to build the customer_analytics_c360 data product
- real-time processing to build the same customer_analytics_c360 data product using Confluent Cloud Flink
Folder Structure¶
- c360_* are a set of projects to demonstrate how to define a data as a product in Spark and its equivalent for real time processing in Flink SQL.
- c360_spark_processing a batch implementation using the
star schemaand Kimball method to organize facts, and dimensions. The project description is here.. This project was created usingshift_left project init c360_spark_processingcommand. - c360_mock_data: a set of CSV files to create synthetic data.
- c360_api: A Python FastAPI-based REST API that exposes Customer 360 analytics for Marketing, Product, and Finance teams built on top of Spark data pipeline.
- c360_flink_processing: Building the same analytic data as a product with Flink processing.
- Kafka_consumer: a cli based tool to consume the different topics from Kafka to validate the different results within the pipeline.
To Do¶
- Kafka Producer to simulate injecting the data into raw topics
- Add tableflow in IaC for the customer_analytics_c360 topic to Iceberg table in AWS S3 or Confluent Storage.
- Python Flask App with a HTML dashboard to present the different data products from parquet files, using duckdb for query engine
- Be able to run the same SQLs on Confluent Platform for Flink.