Customer 360 Analytics¶

How to consume this content¶

The project is organized in two folds:

a batch processing using Apache Sparks to build the customer_analytics_c360 data product
real-time processing to build the same customer_analytics_c360 data product using Confluent Cloud Flink

c360_* are a set of projects to demonstrate how to define a data as a product in Spark and its equivalent for real time processing in Flink SQL.
c360_spark_processing a batch implementation using the star schema and Kimball method to organize facts, and dimensions. The project description is here.. This project was created using shift_left project init c360_spark_processing command.
c360_mock_data: a set of CSV files to create synthetic data.
c360_api: A Python FastAPI-based REST API that exposes Customer 360 analytics for Marketing, Product, and Finance teams built on top of Spark data pipeline.
c360_flink_processing: Building the same analytic data as a product with Flink processing.
Kafka_consumer: a cli based tool to consume the different topics from Kafka to validate the different results within the pipeline.

Kafka Producer to simulate injecting the data into raw topics
Add tableflow in IaC for the customer_analytics_c360 topic to Iceberg table in AWS S3 or Confluent Storage.
Python Flask App with a HTML dashboard to present the different data products from parquet files, using duckdb for query engine
Be able to run the same SQLs on Confluent Platform for Flink.