Skip to content

Build a Serverless Real-Time Data Processing App

From this hands-on lab

Some changes from the lab itself.

Another Demo

Functional architecture

  • We have 10000 refrigerator containers in the field, which send sensor telemetry messages using MQTT over HTTPS to a data ingestion and real-time event backbone as landing zone. This layer keeps data for 7 days, so on the left side we are moving records to Data Lake

  • From Data lake data engineers and data scientists will query data at rest, encrypted, and even build dashboards
  • On the right side we have the cold chain monitor component, and the best action processing in case there is a refrigeration container or reefer with problem.
  • As part of the action processing we can notify a field engineer to work on container, persist data in a DB for triggering a downstream business process

Hands-on architecture

  • Now on the scope of the demonstration we build for you, using AWS managed services, this is the architecture we are proposing

  • We do not have refrigerator at our hand so we get the data structure from your engineers, and develop a small simulator to send records to the data ingestion layer.
  • The AWS Kinesis Data Stream is the service to manage pub/sub processing from streams (like a topic). we have telemetries and faulty-reefers
  • The stateful logic processing is done inside Kinesis analytics: The logic is to consume messages from data stream, process them with time window constraint and logic then write messages to the faulty-reefers data stream
  • On the right side we have the act part, where you can have different consumers which are getting message asynchronously to process the faulty message
  • For data lake, Firehose is the service to do data movement and transformation, like ETL. The target is distributed storage, called S3. The persistence looks like a file system folder, and is named bucker.
  • For dashboard we use QuickSight

Kinesis SDK for python

Create Kinesis Elements

Data Streams

Streams Analytics

CREATE OR REPLACE PUMP "STREAM_PUMP" AS
  INSERT INTO "FAULTY_REEFERS"
  SELECT STREAM "container_id","measurement_time","product_id", "temperature","kilowatts","oxygen_level","carbon_dioxide_level","fan_1","latitude","longitude",
 TUMBLE_START(ts, INTERVAL '1' MINUTE) as window_start,
 TUMBLE_STOP(ts, INTERVAL '1' MINUTE) as window_stop,
COUNT(temperature)
FROM "SOURCE_SQL_STREAM_001"
GROUP BY TUMBLE(ts, INTERVAL '1' MINUTE), temperature
    WHERE "temperature" > 18;