n8n workflow crashes repeatedly in version 2.1.4 without any error showing in the workflow.
I have enabled “Save execution progress” to ensure I can see progress, and I can see that the workflow simply stops on different nodes each time without any error message.
What’s happening? Where can I find debug information?
This same workflow runs successfully on n8n self hosted, but I want to move to cloud in v2+, so I have started to migrate workflows and make the small alternations needed. However, this appears to be more of a cloud platform issue than a node/version issue.
Is it possible to see system logs or out of memory errors in the n8n cloud hosted option?
I don’t believe you can see system logs on the n8n cloud version, but i know your issue. This means that the workflow crashed and the data load was too heavy. Add some loop nodes and set them to 1 send at a time. It usually fixes it for me!
Agree, I’ve installed 2.1.4 on an old and low powered mini pc and the same workflow works fine, so it’s not the version, must be the environment and memory limitations.
Will try and tweak it like you suggest, it is quite a heavy data processing workflow.
Did all nodes finish every iteration? My cloud hosted issue was almost certainly an out of memory problem.
Running the same workflows locally I can see the container memory spiking to >2gb.
That was causing silent crashes for me.
Running locally I still found problems with my code node JavaScript. Some of it was improved by reducing data and simplifying data processing, but I also think the v2 code node runner architecture is slower and less reliable. Important for cloud hosting other people’s workflows, but not if you’re self hosting your own.
There are various memory options which you can tweak in the environment variables when self hosting, and also turning off the v2 code node runners if you’re only using JavaScript.
# 1. Uncap the memory limit for data payloads
# Default is often ~16MB. This bumps it to 256MB.
N8N_PAYLOAD_SIZE_MAX=256
2. Increase Node.js Heap Size
Allows the underlying Node process to use more RAM before crashing.
Set this to ~75-80% of your available VPS RAM. (e.g., 4096 for 4GB RAM, 8192 for 8GB RAM)
NODE_OPTIONS=“–max-old-space-size=4096”
3. Increase Task Runner Timeout
Prevents n8n from killing data hungry nodes if they take >60s to serialize/move data.
N8N_RUNNERS_TASK_TIMEOUT=300
4. Offload Binary Data to Disk
Stops n8n from storing file uploads/generated PDFs in RAM.
N8N_DEFAULT_BINARY_DATA_MODE=filesystem
5. Increase Workflow Execution Timeout
Hard limit for the entire workflow run.
EXECUTIONS_TIMEOUT=3600
Environment Variables to Configure
Add or modify the following environment variables in your self-hosted setup (Docker Compose or .env file):
1. Disable Runners
• N8N_RUNNERS_ENABLED=false
• Description: This is the primary switch. Setting it to false forces n8n to execute JavaScript code within the main application process.
• Note: This effectively disables Python support, as Python Code nodes require the runner architecture.
When code runs in the main process, it uses the standard Node.js VM sandbox. You may need to relax permissions if your code imports modules:
• NODE_FUNCTION_ALLOW_BUILTIN=*
• Description: Allows the Code node to import built-in Node.js modules (e.g., fs, crypto, path). Replace * with specific module names if you prefer stricter security.
• NODE_FUNCTION_ALLOW_EXTERNAL=*
• Description: Allows the Code node to import external npm packages installed in the n8n container.
Myself I am trying as well to understand the n8n v2 architecture(from the IAC view point, since i am interestedin aws CDK).
Anyway, this “benchmark” you guys did, can be helpful if @Jon_James can check your cloud workspace if thers specific data.
As far I understood, v2 went for “stability & security” , and not resource consumption (I may be wrong BTW since I don’t have access to cloud backend settings).
While your explained setup is great for self-hosted , as a v1 .
Anyway, what I understood until now is:
Main leverages to workers(or to task-runner directly to offload manual execution), and workers use task-runners…
``
While there isn’t a single “only way” to combine these components, the architecture for n8n v2 follows a hierarchy of delegation designed to maximize stability and security.
The “schema” for a full-scale n8n v2 setup generally follows these principles:
1. The Core Hierarchy
The most robust pattern for scaling in v2 is:
Main leverages to Workers: The Main instance handles the UI and triggers, then offloads the workflow execution to Workers via Redis [Queue mode].
Workers leverage to Task Runners: When a Worker executes a workflow and hits a Code node, it delegates that specific task to a Task Runner sidecar.
2. Main Instance Scenarios
The Main instance’s relationship with Task Runners depends on how you handle manual testing:
Manual Offloading Enabled (OFFLOAD_MANUAL_EXECUTIONS_TO_WORKERS=true): In this recommended setup, the Main instance doesn’t actually need its own Task Runner because it sends even manual “Test Workflow” requests to the workers. The workers then use their own Task Runners.
Manual Offloading Disabled: If this is false, the Main instance must have a Task Runner (either internal or an external sidecar) to execute code when you test workflows in the editor.
3. Summary of the “Schema”
A fully scaled n8n v2 production environment typically looks like this:
Task Runner Sidecars: Each worker instance is paired with its own n8nio/runners container to handle JavaScript and Python code execution [External mode].
By following this pattern, you ensure that even if a user’s Python or JS script crashes or consumes too much memory, it only affects the Task Runner sidecar and not the Worker or the Main instance [Task runners for the Code node].
``
*CORRECT ME If I am wrong, cause I am more than curios myself to solve this “mystery” , and be prepared to migrate without worrying too much, and plan accordingly.
Cheers!
Will bookmark this topic since I am interested .
P.S v2.2.1(and task-runner MUST be same version).
P.P.S got lured into details, and forgot to ask about it… that spike >2gb , is beeing by the js code… can you share an image with workflow or the code node?
Maybe is too “heavy” in some way, and can be improved?
After further investigation, my issue is a different one.
My workflow executions are actually being marked as finished, but when we restart n8n and its workers, the workers run a sort of restore process that marks all executions in their logs as crashed, without checking the DB if they had finished or not.