N8n Kubernetes Queue Mode: Jobs Not Distributed Across Multiple Workers (HPA/Scaling Not Working)

Hi all,

I’m running n8n (Enterprise) in Kubernetes using the official Helm chart with queue mode enabled and Valkey (Redis-compatible) as the backend. My goal is to have multiple worker pods process jobs in parallel for scaling, but all jobs are being processed by a single worker, even when I submit a large batch (e.g., 50 jobs).

My setup:

  • n8n main, worker, and webhook pods deployed via Helm

  • queue.bull.redis.host points to a working Valkey instance

  • EXECUTIONS_MODE=queue and N8N_RUNNERS_ENABLED=true

  • Worker section in values.yaml:

    worker:
      enabled: true
      concurrency: 5
      replicaCount: 1
      autoscaling:
        enabled: true
        minReplicas: 2
        maxReplicas: 4
        targetCPUUtilizationPercentage: 70
        targetMemoryUtilizationPercentage: 80
      resources:
        limits:
          cpu: 2000m
          memory: 2Gi
        requests:
          cpu: 500m
          memory: 512Mi
    
  • Scaling worker deployment manually (e.g., to 3 pods) still results in all jobs being processed by a single worker pod (same execution/worker ID in workflow output)

Kindly suggest what needs to be done in such case

1 Like

Yeah the key thing here is queue mode distributes executions across workers, not items within a single execution. If you’re sending 50 items through one trigger that’s still just one execution ID going to one worker. You’d need to split those into 50 separate workflow executions (like using a webhook that gets called per item) for the workers to actually share the load.

Thank-You for those detailed steps.

That explains why I was seeing the same worker id.

I have replicated the steps now with queue mode on Kubernetes with multiple workers, and manual scaling works perfectly - jobs distribute correctly across workers when I manually scale the deployment. However, my HPA (Horizontal Pod Autoscaler) is not automatically scaling workers under load.

  • Worker concurrency: Set to 1 for testing

  • autoscaling:

    enabled: true

    minReplicas: 1

    maxReplicas: 10

    targetCPUUtilizationPercentage: 70

    targetMemoryUtilizationPercentage: 80

Debugging attempts:

  • Verified metrics-server is running

  • Checked RBAC permissions

  • Confirmed resource requests/limits are set (required for HPA)

  • Manual scaling confirms queue distribution works correctly

Any insights on getting n8n worker HPA to function properly would be greatly appreciated!

1 Like

@akshika.sharma
glad i could help.
Your HPA isn’t scaling because it monitors CPU, but n8n workers are I/O-bound—they wait for external responses, not CPU load. Even with 10,000 queued jobs, CPU usage may stay low (e.g., 10%), so the HPA never hits its 70% threshold.

you have to trick your current setup.

  • Drop your targetCPUUtilizationPercentage from 70 to something tiny, like 20.
  • Drop your pod’s requests.cpu to a very low baseline (like 100m).

By shrinking the baseline capacity, even a tiny amount of actual processing will mathematically look like a massive CPU spike, which forces the HPA to finally trigger.

One thing I notice in your latest config is maxReplicas: 1 which means HPA literally cannot scale beyond 1 pod even if it wanted to, you’d want to bump that back up to 3 or 4. The CPU threshold trick might help a bit but honestly for queue-based workloads you’re better off looking at KEDA with a Redis scaler that watches the actual Bull queue length, that way pods scale based on how many jobs are waiting rather than trying to infer load from CPU metrics.

Thanks , I am just playing around with the settings to understand what can work best . I tried with really low limits .

CHART APP VERSION n8n-2.0.1 1.122.4

kubectl get hpa -n n8n-system
No resources found in n8n-system namespace.

Removed replica Count as well . But I don’t see any HPA getting created .

So I am wondering if intrinsically HPA is supported in the chart, also or not.

worker:
  enabled: true
  config: {}
  secret: {}
  concurrency: 1

    accessModes:
      - ReadWriteOnce
    # Persistent Volume size
    size: 1Gi
    # Use an existing PVC
    # existingClaim:

  # Number of desired pods.
 # replicaCount: 1


  deploymentStrategy:
    type: "Recreate"
    maxSurge: "10%"
    # maxUnavailable: "50%"



  resources:
    limits:
      cpu: 2000m
      memory: 2Gi
    requests:
      cpu: 500m
      memory: 512Mi

  autoscaling:
    enabled: true
    minReplicas: 1
    maxReplicas: 4
    targetCPUUtilizationPercentage: 10
    targetMemoryUtilizationPercentage: 20
    # targetMemoryUtilizationPercentage: 80

  nodeSelector: {}
  tolerations: []
  affinity: {}
  extraVolumeMounts: *extraVolumeMounts
  extraVolumes: *extraVolumes

1 Like

Hi @akshika.sharma
If kubectl get hpa returns nothing, it means the chart isn’t creating the HPA resource, so this isn’t a queue issue, and since manual scaling works and jobs distribute correctly, queue mode itself is fine. I would double-check the chart version and render the template with helm template to confirm whether the worker HPA is actually supported and being generated, because in some chart versions autoscaling is only applied to specific components or requires a different values structure.

1 Like

Thankyou all . Keda + Redis was a smooth option

@akshika.sharma uhul :tada:
If your question was solved, please consider leaving a like or marking the reply as the solution
(it helps others find the answer more easily and also supports community contributors.)