I’m running n8n (Enterprise) in Kubernetes using the official Helm chart with queue mode enabled and Valkey (Redis-compatible) as the backend. My goal is to have multiple worker pods process jobs in parallel for scaling, but all jobs are being processed by a single worker, even when I submit a large batch (e.g., 50 jobs).
My setup:
n8n main, worker, and webhook pods deployed via Helm
queue.bull.redis.host points to a working Valkey instance
EXECUTIONS_MODE=queue and N8N_RUNNERS_ENABLED=true
Scaling worker deployment manually (e.g., to 3 pods) still results in all jobs being processed by a single worker pod (same execution/worker ID in workflow output)
Yeah the key thing here is queue mode distributes executions across workers, not items within a single execution. If you’re sending 50 items through one trigger that’s still just one execution ID going to one worker. You’d need to split those into 50 separate workflow executions (like using a webhook that gets called per item) for the workers to actually share the load.
That explains why I was seeing the same worker id.
I have replicated the steps now with queue mode on Kubernetes with multiple workers, and manual scaling works perfectly - jobs distribute correctly across workers when I manually scale the deployment. However, my HPA (Horizontal Pod Autoscaler) is not automatically scaling workers under load.
Worker concurrency: Set to 1 for testing
autoscaling:
enabled: true
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 70
targetMemoryUtilizationPercentage: 80
Debugging attempts:
Verified metrics-server is running
Checked RBAC permissions
Confirmed resource requests/limits are set (required for HPA)
Manual scaling confirms queue distribution works correctly
Any insights on getting n8n worker HPA to function properly would be greatly appreciated!
@akshika.sharma
glad i could help.
Your HPA isn’t scaling because it monitors CPU, but n8n workers are I/O-bound—they wait for external responses, not CPU load. Even with 10,000 queued jobs, CPU usage may stay low (e.g., 10%), so the HPA never hits its 70% threshold.
you have to trick your current setup.
Drop your targetCPUUtilizationPercentage from 70 to something tiny, like 20.
Drop your pod’s requests.cpu to a very low baseline (like 100m).
By shrinking the baseline capacity, even a tiny amount of actual processing will mathematically look like a massive CPU spike, which forces the HPA to finally trigger.
One thing I notice in your latest config is maxReplicas: 1 which means HPA literally cannot scale beyond 1 pod even if it wanted to, you’d want to bump that back up to 3 or 4. The CPU threshold trick might help a bit but honestly for queue-based workloads you’re better off looking at KEDA with a Redis scaler that watches the actual Bull queue length, that way pods scale based on how many jobs are waiting rather than trying to infer load from CPU metrics.
Hi @akshika.sharma
If kubectl get hpa returns nothing, it means the chart isn’t creating the HPA resource, so this isn’t a queue issue, and since manual scaling works and jobs distribute correctly, queue mode itself is fine. I would double-check the chart version and render the template with helm template to confirm whether the worker HPA is actually supported and being generated, because in some chart versions autoscaling is only applied to specific components or requires a different values structure.
@akshika.sharma uhul
If your question was solved, please consider leaving a like or marking the reply as the solution
(it helps others find the answer more easily and also supports community contributors.)