Recovery from failure (Community Edition)

In an n8n workflow with multiple stages and nodes, how can I ensure that after an unexpected server crash, the workflow execution automatically resumes from the exact node where the failure happened, instead of starting the execution from the very first node?
Note: I am using my local server/docker not cloud.

Information on your n8n setup

  • n8n version: latest
  • Database (default: SQLite):
  • n8n EXECUTIONS_PROCESS setting (default: own):
  • Running n8n via (Docker, desktop app)

My honest answer you can’t guarantee automatic resume from the exact failed node after a server crash with default settings in a self hosted instance.. i don’t think n8n is designed as a fully resilient, transactional workflow engine like apache air workflow with checkpointing

Is n8n Enterprise Cloud the only way to guarantee workflow execution resumes from the failure point?

i don’t know if this resolve your issue but check it

Thanks but i was asking about how it continue where it stopped in case of server crash/failure not case of error.

Hi @mohamedelnady-406, Im not sure how n8n for enterprise works for your use case, however i dont think it is a core feature of n8n as a platform. I feel your requirement is more of an architectural design requirement to cater for volatile situations. Have a read up on event driven architecture and pub/sub.

Essentially you would place a queueing system in place (MQ) to keep track of things which needs to be processed and then your workflows would pick tasks from the queue and process them either concurrently or in serial. This way when the system goes down for whatever reason, it can essentially continue where it left off.

Hi @Wouter_Nigrini , That’s really insightful. but I am thinking of a scenario.
when an event has been consumed from MQ and n8n server fails while processing that event. how to know the last point of execution so that i can continue after recovery at the point where it failed not from the very beginning of the workflow.
In other words, The main focus is how to let n8n server recover and re-execute from the point it failed at.

What if you split your workflow into smaller segments that you don’t mind being retried as a whole?

For example:

  1. main workflow
  2. first batch of nodes as a sub-workflow
  3. second step as another sub-workflow
  4. and so on

When n8n goes down, it retries the whole workflow. If each workflow is just one logical step, that’s usually fine to rerun.

This setup also helps in other cases, like doing full retries at the workflow level when something fails or having better scaling flows.