AWS Step Function¶
AWS Step Functions is a fully managed service that we can use to coordinate the components of distributed applications and microservices using visual workflows.
Value propositions¶
- Resilient workflow automation
- Built-in error handling
- AWS service integration
- Auditable execution history and visual monitoring
- Serverless, can scale by itself
- Support short execution flow, or long lived transactions.
- Standard Workflows follow an exactly-once model, while Express Workflows employ an at-least-once model when called asynchronously, and at-most-once with synchronous call (message may be lost). Important differentiations.
- Reuse business logic in different flows
- Standard is billed by the # of state transitions while express by the number of executions, the duration of execution, and the memory consumed.
- Support a Map operation/state to run a set of workflow steps for each item in a dataset, in parallel. With inline mode maps runs in the context of the workflow. With distributed each map state runs in a child workflow execution, in parallel (scale to more than 40 parallel iterations).
Concepts¶
The workflow is defined using the State Language Notation.
- All work in Step Functions is done by tasks.
- A task performs work by using an activity or an AWS Lambda function, or by passing parameters to the API actions of other services.
- An activity is an application that we write and host on AWS, on premises or on mobile devices.
- Activity workers execute the application code and report success or failure.
- The patterns supported are sequential sequencing of tasks, conditional branching, looping task, try-catch-finally for error and compensation, parallel execution.
- A
Task
state ("Type": "Task") represents a single unit of work performed by a state machine. - Can integrate Human task.
- They are long running flow. But there is also the Express Workflow that is a short duration execution to support 100k state transitions per sec. ( Duration has to be less than 5 mins). It saves states in memory.
- When calling external service, one of the parameter is the
TaskToken
to send back to the Step service so the corresponding task can get the asynch response. - For Standard workflow the max duration is 365 days.
- Input data can be pass in StartExecution call.
- For Standard Workflows, you can retrieve execution results from the execution history using external callers, such as the
DescribeExecution
action (awslocal stepfunctions describe-execution --execution-arn ...
). - A
path
is a string beginning with$
that you can use to identify components within JSON text. InputPath
can limit the input that is passed by filtering the JSON notation by using a pathOutputPath
enables you to select a portion of the state output to pass to the next state.
Integration¶
- Step can be started from HTTP requests from APIGTW, IoT Rules, EventBridge, Lambda...
- For asynchronous express workflows (SDK StartExecution), to get the results we must poll cloudwatch logs.
- Step function uses a context object to keep, in JSON, the state of the state machine execution.
Call Lambda
Need a reference to the function ARN, and then stipulates the input, output parameters and retries logic. The Parameter for "Payload.$": "$" takes the input and send that to the lambda.
"Get credit limit": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"OutputPath": "$.Payload",
"Parameters": {
"Payload.$": "$",
"FunctionName": "arn:aws:lambda:us-west-2:000000000000:function:GetCreditLimit"
},
"Retry": [
{
"ErrorEquals": [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException",
"Lambda.TooManyRequestsException"
],
"IntervalSeconds": 2,
"MaxAttempts": 6,
"BackoffRate": 2
}
],
"Next": "Credit applied >= 5000?"
},
Error Handling¶
Step Functions supports errors and retries via a looping pattern and provides try/catch/finally logic for known or unknown errors.
At a high level, task and parallel states can use fields named Retry and Catch for error handling. When a state reports an error and there is no Retry or the retries don’t resolve the problem, Step Functions looks through the catchers for a matching error and transitions to the state named in the next field.
Each catcher can specify multiple errors to handle. The reserved name States.ALL
is a wildcard that matches any error name.
Hands-on demo¶
We can use LocalStack to start developing Step function in vscode and run locally. See the labs/step/first-solution
Basic Step flow to Lambda call¶
See the Lab in labs/step/step2lambda folder.
Simple solution¶
This demo includes all the basic control flow constructs for a workflow: call lambda, read from DynamoDB, and perform Map computation. See the labs/step/first-solution.
Image processing from S3¶
Image processing for Autonomous car: upload traveler's selfie photos, to build a collection against it a camera can send a new image to recognize the traveler's face so he/she can enter in the car. This example is based on the image processing with step function workshop. See this repo for SAM / CloudFormation templates and function code.
Step Function FAQs¶
- Where process instance information is persisted for the running workflows?
- How to support re-entrance?
- How to support DR
Expected Skill Set¶
What are the different state types?
Pass, Task, Choice, Wait, Succeed, Fail, Parallel, Map.
Deeper dive¶
- Main product marketing page
- 10 mns getting started
- Information resources like reference architecture
- How step functions work.
- Design pattern for step functions.
- Git repo with Step Function examples defined as CloudFormation templates.
- Developer guide.
- VScode extension with visualization
- Step function workshop.
- Saga with State machine.
- Create serverless workflow with Step - hands-on tutorial 10 minutes