I am using n8n on docker image and i am trying to simulate sudden crashes/failures for n8n server. When does the execution status is regarded as cancelled and when does it regarded as error ?
Subject: Understanding Execution States During n8n Crashes
Great question! Understanding execution states during failures is crucial for building resilient workflows. Let me explain the difference between cancelled and error states in n8n.
Execution Status Types
1. Error State
An execution is marked as error when:
A node throws an exception/error during execution
A workflow encounters a runtime error (API timeout, invalid data, etc.)
The workflow completes abnormally but n8n is still running
Error handling can be caught using the On Error workflow setting
Example scenarios:
HTTP Request node receives 404/500 response
Code node throws an exception
Database connection fails
Validation errors
2. Cancelled State
An execution is marked as cancelled when:
User manually stops the execution from the UI
n8n server crashes/restarts during execution
Docker container stops abruptly
Process is killed (SIGKILL, SIGTERM)
System runs out of resources mid-execution
Key difference: cancelled means the execution was interrupted externally, not by a logical error in the workflow.
Simulating Crashes for Testing
To simulate different failure scenarios:
Scenario 1: Simulate Server Crash
# Force kill the n8n container
docker kill n8n
# Or restart it abruptly
docker restart n8n
Result: Running executions → cancelled
Scenario 2: Simulate Graceful Shutdown
# Send SIGTERM (graceful shutdown)
docker stop n8n
Result: n8n attempts to finish current executions, but forced timeout → cancelled
Scenario 3: Simulate Resource Exhaustion
# Limit container memory and trigger OOM
docker run --memory=“512m” --name n8n n8nio/n8n
Result: Executions interrupted by OOM killer → cancelled
Scenario 4: Trigger Error State (Not Crash)
Create a workflow with intentional errors:
HTTP Request to non-existent endpoint
Code node with throw new Error(‘Test error’)
Invalid credentials
Result: Execution completes with error status
Practical Testing Approach
Here’s a workflow to test crash handling:
Create a long-running workflow:
// Code node - simulates long execution
const start = Date.now();
while (Date.now() - start < 60000) {
// Run for 60 seconds
await new Promise(resolve => setTimeout(resolve, 1000));
}
return { success: true };
Start the workflow execution
During execution, simulate crash:
docker kill n8n
docker start n8n
Check execution status:
Should show as cancelled
Can be found in execution history
Important Behaviors to Note
Queue Mode (if enabled)
With N8N_EXECUTIONS_MODE=queue:
Executions in queue survive crashes
They restart automatically after recovery
Status depends on progress when crashed
Webhook Triggers
If crash occurs during webhook execution:
Client receives timeout/connection error
Execution marked as cancelled
Webhook sender should implement retry logic
Recovery Settings
Configure recovery behavior:
# In docker-compose.yml or environment
N8N_EXECUTIONS_DATA_SAVE_ON_ERROR=all
N8N_EXECUTIONS_DATA_SAVE_ON_SUCCESS=all
N8N_EXECUTIONS_DATA_SAVE_MANUAL_EXECUTIONS=true
This ensures you can analyze what happened during crashes.
Testing Checklist
Test graceful shutdown (docker stop)
Test force kill (docker kill)
Test OOM scenarios
Test network isolation
Verify webhook behavior during crash
Check execution recovery after restart
Test with queue mode enabled/disabled
Monitoring Recommendations
For production resilience:
Use Docker health checks
Implement external monitoring (Prometheus/Grafana)
Set up restart policies: restart: unless-stopped
Enable execution data persistence
Configure proper logging
Summary:
Error = Workflow logic failure while n8n is running
Cancelled = Execution interrupted by external force (crash, manual stop, etc.)
Let me know if you need help setting up specific crash recovery scenarios or have questions about production resilience!
Thanks for insightful information.
I have tried to simulate failures on workflow that has about 20 HTTP node and i terminated docker image by sending SIGTERM (ctrl+c) and it entered into graceful period then stopped.
the execution showed status=error the error at node was Execution stopped at this node.
in some other workflow i tried the same experience but i got status=canceled.
The point is that i was able to retry the executions having status=error. but i couldn’t retry cancelled executions. although i have set execution progress to save in workflow setting.
Is there any way to make the canceled worfklow retryable to recover?