I’ve been looking for a list of scenarii with n8n crashing during usage and I couldn’t find any. So this message is more a “research for feedback from experience” than a proper question. I’m interested in knowing how n8n perform when any of its processes are being killed and how workflow resume/crash/do nothing.
Here are few tests I’ve checked using either cron or webhook, and I’ll be happy to get more feedback from other members:
Kill the main process when a workflow is running on worker process. Expected behavior: As a workflow scheduled is supposed to run on worker process, the workflow should keep running. Result: - The workflow kept running and complete successfully.
Kill the main process just before a workflow scheduling. Expected behavior: Get the workflow triggered anyway on worker side. Result: - The workflow didn’t start at all.
Kill workers during workflow running on them. Expected behavior: Workflow should keep running or resumed from it’s crashed status Result: - The workflow continues until its end. But note that killing the main process stop the execution.
Kill webhook process once webhook called. Expected behavior: The webhook call should be stopped but the workflow should be checked and resumed by main. Result: - The webhook call was dropped (Bad Gateway) but the process kept running until its completion on worker.
Kill webhook process before webhook call. Expected behavior: As the call will not reach the webhook process, it should fail. Result: - The call was dropped (Gateway Timeout) and the main process hasn’t try anything.
Information on your n8n setup
n8n version: 0.207.1
Database you’re using (default: SQLite): Postgresql
Running n8n with the execution process [own(default), main]: own
Running n8n via [Docker, npm, n8n.cloud, desktop app]: k8s with 1 main, 1 webhook, 3 workers & 1 redis
That is some interesting testing, I am surprised that when the main process isn’t running the workers don’t continue with their schedules. @krynble is this what you would expect to see?
Hey @Romuald_BARON I really appreciate the time invested.
I think there is something weird in scenario (3) where you mention that you kill the worker while an execution is running on them and the execution goes until the end. If you kill the process gracefully (via SIGTERM) and should try a graceful shutdown, allowing up to 30 seconds for the execution to finish. In this case, this is fine and expected. But if an execution is a long running one, it should be interrupted immediately when you kill the worker. Killing the main process should have no influence in the execution of a worker.
What might also be the case above is that you’re running what we call a manual execution, where you manually click to execute a workflow. Manual executions are always executed by the main n8n process and never delegated to workers.
About (4) when you have webhook + worker processes the execution happens without interference of the main process - in fact it could even be killed and it wouldn’t matter. Webhook and worker processes can communicate to each other directly (using Redis + Database for brokerage and data sync).
I’ve intentionally left (2) to the end because this is the most special one.
All triggers that do not rely on HTTP requests to begin (i.e. are not webhook related) are always started by the main n8n process. This includes any sort of time triggers (Cron, Schedule or Interval nodes) as well as queue triggers (such as RabbitMQ) or polling triggers (such as the Clickify Trigger node).
The above examples of triggers start execution from something other than an HTTP request call. These are all started by the main n8n process and if it’s not running, the executions would never start.