Constant 503s

I’m regularly getting 503s as soon as there is a bit of traffic on the n8n instance. I have hundreds of Workflows, but the load is pretty low (10-30 consurrent Executions). How can I approach debugging this?

Helm chart:

global:
  autoscaling:
    enabled: true
    minReplicas: 5
n8n:
  n8n:
    concurrency: 30
    scaling:
      webhook:
        count: 10
      worker:
        concurrency: 15
        count: 12
    webhookResources:
      limits:
        cpu: 2
        memory: 1Gi
      requests:
        cpu: 200m
    workerResources:
      limits:
        cpu: 2
        memory: 3Gi
      requests:
        cpu: 1

I have very long workflows (1 hour +) that calls others workflows.

Here are 30mn of n8n logs while the instance was 503ing:
https://file.io/tuoSnpuxhvgw

Thanks a lot

Hi @Lesterpaintstheworld, that link appears to have expired, but in general it’s very hard to say what might be happening here without a reproducible example.

Anything obvious, such as unstable network connections for example to the n8n database?

If not, it might be worth enabling debug logging in a first step. If there are no obvious errors you might want to take a look at the n8nEventLog.log file in your .n8n directory. Among other entries, this file should contain the most recent nodes executed by your n8n instance, so can help track down where things might have gone wrong.