K8s manifest best practices for code node heavy tasks and concurrency (timeouts and deployment strategy)

Describe the problem/error/question

meta orchestrator driven heavy hiearchical workflow implemnetation with a main+sidecar runners setup.
what is the best way to ensure smooth execution of workflows?
currently when I try running, the heavier ones cause multiple main or sometimes runners to spawn up, causing timeouts or crashes
I know this is also about understanding k8s better, to which I am new. Looking forward to hearing from folks who have already achieved productionizing n8n on k8s
So far I am playing around with timeout configs, deployment strats (rollingUpdate / Recreate) and env vars (MAX_SPACE | RUNNERS_TIMEOUT)
any enforcement including and beyond above points will be appreciated

What is the error message (if any)?

This also happens for runners sometimes during code node heavy executions.

Please share your workflow

(Select the nodes on your canvas and use the keyboard shortcuts CMD+C/CTRL+C and CMD+V/CTRL+V to copy and paste the workflow.)

Share the output returned by the last node

Information on your n8n setup

  • n8n version: 2.1.1
  • Database (default: SQLite): postgres
  • n8n EXECUTIONS_PROCESS setting (default: own, main):
  • Running n8n via (Docker, npm, n8n cloud, desktop app): k8s
  • Operating system: linux

Hi @adarsh-lm

I wouldn’t focus on rollingUpdate vs Recreate first, for Code-node-heavy workloads, I’d treat this as a scaling issue: use queue mode, keep the main pod lean, run task runners in external mode, and set execution timeouts so stuck runs don’t pile up. I’d also verify whether those extra n8n-main pods are coming from your K8s deployment/HPA rather than n8n itself, since n8n doesn’t spawn main pods on its own, and if you’re still on 2.1.1, I’d seriously consider upgrading before tuning too much.

1 Like