The idea is:
Allow a Health-Check workflow
to be attached to any other workflow, similar to the way an Error workflow
is attached, in the workflow settings.
- The
Health-Check workflow
would contain nodes that make lightweight, “everything good?” requests to downstream services, or lightweight checks for other resources, and return an overall healthy / not-healthy status. - The workflow to which the
Health-Check workflow
is attached would prevent executions, or accept-and-queue (i.e. immediately pause/wait) executions while health-checks are failing. Resume of in-process workflows in await
state would also be stopped until health-checks are “passing” again. - For workflows that can be activated (or not) the attached
Health-Check workflow
would be active only when the workflow is active. - For workflows that cannot be activated, the attached
Health-Check workflow
would run on every execution “up front” before attempting to execute any workflow steps. - Notifications, or other secondary actions (like a
restart
call to aprocess-manager
service), could be built into theHealth-Check workflow
. - Intervals for how often a health-check is performed, and how long the workflow execution is suspended, would also be specified in the settings of the workflow to which the
Health-Check workflow
is attached.
My use case:
Workflows sometimes fail and stop, after partial completion, because a downstream service or resource required in a later step is temporarily unavailable.
I think it would be beneficial to add this because:
Long-running workflows and/or workflows with “hard to reverse” or “hard to repeat” (non-idempotent) steps would benefit from a separate, background process that checks availability/health of all downstream service / resource dependencies, and prevents the workflow from executing or resuming from a wait
until the dependencies are again available/healthy.
Any resources to support this?
- Circuit Breaker Pattern
- Docs for existing Create and set an error workflow
Are you willing to work on this?
Could help with testing. Could possibly help with refining the design/approach. Could possibly help with development (less likely).