I am deploying a new n8n queue-mode setup (main + workers, runners, Redis, Postgres).
Everything works correctly under normal operation. However, after restarting the stack to apply configuration changes, the workers become stuck while recovering crashed executions.
I discovered around 750 crashed executions in the UI, and shortly after that, all executions that were previously marked as successful suddenly switched to an error state.
What is the error message (if any)?
Please share your workflow
Was testing with simple code nodes and wait to test concurrency.
Workers recover after sometime. I think what’s happening is that it looks at previous executions, doesn’t see finish time or status because it’s looking the logs not the DB. then marks it as crashed.
The longer i keep it running, the more crashed executions i see.
No errors, just the worker being in recovery.
I made a custom image of v2.6.3 disabling the recovery process, and using it for the workers. And it’s stable so far.
Thanks for the detailed info, your workaround of disabling recovery on workers and having stability confirms that is the root cause, but tbh this definitely sounds like a bug that should be reported to the n8n team. Tagging @BramKn here for help
I have seen weird behaviour in older versions, but those issues were patched.
Seems like it could be a setup issue or bug. Not sure.
You would need to give us a lot more information on the instance setup to determine what it is and possibly report a bug.