A Workflow contains one or more steps. Each step is a self-contained, individually retriable component of a Workflow. Steps may emit (optional) state that allows a Workflow to persist and continue from that step, even if a Workflow fails due to a network or infrastructure issue.
This is a small guidebook on how to build more resilient and correct Workflows.
Ensure API/Binding calls are idempotent
Because a step might be retried multiple times, your steps should (ideally) be idempotent. For context, idempotency is a logical property where the operation (in this case a step),
can be applied multiple times without changing the result beyond the initial application.
As an example, let us assume you have a Workflow that charges your customers, and you really do not want to charge them twice by accident. Before charging them, you should
check if they were already charged:
Make your steps granular
Steps should be as self-contained as possible. This allows your own logic to be more durable in case of failures in third-party APIs, network errors, and so on.
You can also think of it as a transaction, or a unit of work.
β Minimize the number of API/binding calls per step (unless you need multiple calls to prove idempotency).
Otherwise, your entire Workflow might not be as durable as you might think, and you may encounter some undefined behaviour. You can avoid them by following the rules below:
π΄ Do not encapsulate your entire logic in one single step.
π΄ Do not call separate services in the same step (unless you need it to prove idempotency).
π΄ Do not make too many service calls in the same step (unless you need it to prove idempotency).
π΄ Do not do too much CPU-intensive work inside a single step - sometimes the engine may have to restart, and it will start over from the beginning of that step.
Do not rely on state outside of a step
Workflows may hibernate and lose all in-memory state. This will happen when engine detects that there is no pending work and can hibernate until it needs to wake-up (because of a sleep, retry, or event).
This means that you should not store state outside of a step:
Instead, you should build top-level state exclusively comprised of step.do returns:
Do not mutate your incoming events
The event passed to your Workflowβs run method is immutable: changes you make to the event are not persisted across steps and/or Workflow restarts.
Name steps deterministically
Dynamically naming a step will prevent it from being cached, and cause the step to be re-run unnecessarily. Step names act as the βcache keyβ in your Workflow.