Describe the problem/error/question
We’re having trouble with a workflow in queue mode that includes a Wait
node. Whenever the wait duration is 1 minute 5 seconds or greater, the workflow starts re-executing from the beginning — using the same execution ID. The previous execution continues for a while but stops shortly afterwards. The same thing happens with nodes that process too long, except maybe query nodes.
What is the error message (if any)?
There are no error messages.
Please share your workflow
Sorry, I’m not sure I’m allowed to share the details, but it’s reproducible with a scheduled trigger, into an edit fields node, to any code node (to debug execution ID, etc.), and to a wait node with a wait time of 65 seconds or more.
Information on your n8n setup
- n8n version: 1.91
- Database (default: SQLite): Postgres (Aurora)
- n8n EXECUTIONS_PROCESS setting (default: own, main): queue
- Running n8n via (Docker, npm, n8n cloud, desktop app): AWS
- Operating system: Linux
Redis v8.0 on Linux
We have tried disabling health checks, and increasing the worker lock duration. Similar to advice here: https://community.n8n.io/t/workflow-picked-up-multiple-times-by-multiple-workers/33581
The CPU and memory load are consistently very low. The AWS tasks are not crashing or restarting.
Here is some additional information about the Environment Variables:
// only for the main service
// 8 vCPUs (8 × 1024)
// 16 GB RAM (16 × 1024)
N8N_RUNNERS_ENABLED = “true”;
N8N_RUNNERS_MODE = “internal”;
OFFLOAD_MANUAL_EXECUTIONS_TO_WORKERS = “true”;
// Worker and Main service
GENERIC_TIMEZONE: "Australia/Brisbane",
TZ: "Australia/Brisbane",
DB_POSTGRESDB_HOST: <hostname>,
DB_POSTGRESDB_USER: <username>,
DB_POSTGRESDB_PASSWORD: <password>,
DB_TYPE: "postgresdb",
EXECUTIONS_MODE: "queue",
QUEUE_HEALTH_CHECK_ACTIVE: "true",
QUEUE_BULL_REDIS_HOST: "redis.n8n",
QUEUE_BULL_REDIS_PORT: "<redacted number>",
QUEUE_WORKER_LOCK_DURATION = "120000";
QUEUE_HEALTH_CHECK_ACTIVE = "false";
N8N_LOG_LEVEL = "debug";
When running we log $execution.id and $workflow.id, these come through the same each time and the workflow re-executes indefinitely, the replaced execution sometimes goes a few nodes onward if there are further nodes connected, but it never gets far and always stops before being able to complete further work which exists for it.
We have four workers’ tasks and one main task.
I have 5 minutes of the tasks’ logs filtered to the repeating execution ID 8140.
What is causing the re-execution? It will cause more work if we have to guarantee every execution is idempotent. How can we prevent it from happening? What are some further troubleshooting steps we can take?