Batch Replay for Failed Executions

clayau · June 14, 2025, 2:58am

The idea is:

To introduce a native bulk re-execution feature within the “Executions” view of a workflow or in the "Executions’ tab of the Project level. This would allow an operator to filter for all failed executions, select them in bulk (e.g., via checkboxes or a “select all” action), and trigger them all with a single “Re-execute Selected” command. The system would then re-run each selected execution using its original trigger data.

My use case:

Our workflows handle thousands of critical, real-time transactions daily. If a dependent service (like a third-party API) has an outage, hundreds of executions can fail in a short period. Once the service is restored, we need to reprocess all of those failed transactions. The current process requires an operator to manually find and re-run each failed execution one-by-one, which is not feasible at scale and delays recovery.

I think it would be beneficial to add this because:

This feature addresses a critical operational need for managing workflows in a real-world production environment.

Drastically Improves Recovery Time: It transforms a slow, manual recovery process into a swift, one-click action, minimizing data processing delays after an outage.
Reduces Operational Overhead: It empowers operators to manage large-scale failures efficiently without the stress and human error associated with repetitive manual tasks.
Enhances Enterprise Readiness: It provides a robust, built-in tool for operational resilience, helping teams meet their Recovery Time Objectives (RTOs). This kind of reliability is essential for enterprise-grade deployments where meeting service-level objectives is paramount.
Provides a Vital Safety Net: While complex queuing patterns are a best practice for some, this feature offers a more accessible, native recovery option for all workflows, making the platform more forgiving and powerful.

Batch Replay for Failed Executions

The idea is:

My use case:

I think it would be beneficial to add this because:

Any resources to support this?

Are you willing to work on this?