We are running n8n in queue mode with scaling and Redis, and recently started encountering a large number of executions stuck in the “Error” + “Starting soon” state.
Here are our setup details and what we’ve done so far:
When did this problem start? Did you upgrade? The timestamp that you refer to is the runtime of the execution in seconds. might it have something to do with daylight time savings? That it got confused. From the description it’s not clear whether you delete everything from the postgresql dB or not (only filtered for some of them). If possible clear everything.
Exactly, I have tried everything and is annyoing because it hides all others runs and it makes it hard to analyse . So the only solution for now is to remove them ?
You have some waits in there. can you tell, me the duration of them?
Also, my 2 cents would be that, there is an execution entity which has a valid status but has a waituntil set. it might be n8n tried to bring it live again, but that the dependencies needed to execute it have already been removed. causing an error.
According to me the only way to find out is to delete all execution data, leaving only workflows / credentials and the absolutely necessary. And to bring down the containers before doing that in postgres
Same issue here.
But on my instance, it always sets the executions to error as soon as i restart the processes.
So once in the evening all processes will be restarted and then it automatically sets most of the executions to “Error” and sets the finish date to the time where i restart.
Interestigly they still keep all the other data in the postgresDB, so if you look in the failed execution, you still have all the data like it was running just fine and if you change the flag in the postgres DB, that also looks fine once again.
I created a postgres node that changes that automatically back again if it seems like an execution that was set to fail by this and after every update i try it again to deactivate this step, but to this point it still comes up, i just live with it right now haha
Same issue here.
It persists even when i turn off all executions. Meaning the trigger node still trying to run, while the whole workflow is turned off.
How can i resolve it?
Hey all,
I was able to troubleshoot and fix this issue.
It’s definitely a weird side case, but I have a feeling it’s everyone’s issue.
Make sure you don’t allow redis to be access by the public!