Hi everyone,
I’m running a multi-tenant n8n platform where each tenant connects to external APIs with different rate limits.
The challenge is that some tenants have very low limits, while others have much higher quotas.
Current setup:Webhook → Queue → Worker → External API
Problems I’m seeing:
• Some tenants hit rate limits much faster than others
• Retries create bursts of traffic
• One busy tenant can consume a large share of worker capacity
• Difficult to enforce fair usage across tenants
I’m considering:
• Per-tenant queues
• Token bucket / leaky bucket rate limiting
• Redis counters
• Dedicated workers for high-volume tenants
For teams running multi-tenant automations at scale:
• How are you enforcing rate limits per tenant?
• Do you isolate queues by tenant or use a shared queue with throttling?
• Any recommended patterns to prevent one tenant from affecting others while maintaining good throughput?
Describe the problem/error/question
What is the error message (if any)?
Please share your workflow
(Select the nodes on your canvas and use the keyboard shortcuts CMD+C/CTRL+C and CMD+V/CTRL+V to copy and paste the workflow.)
Share the output returned by the last node
Information on your n8n setup
- n8n version:
- Database (default: SQLite):
- n8n EXECUTIONS_PROCESS setting (default: own, main):
- Running n8n via (Docker, npm, n8n cloud, desktop app):
- Operating system:
Hi @Decoure_Ryan common approach is to use per-tenant rate limiting instead of one global limit.
Webhook → Queue → Rate Limit Check → Worker → API
Track requests per tenant using Redis or a database, and only process jobs when that tenant is within its allowed limit.
This help to Prevents one tenant from affecting others
Handles different API limits per tenant
Reduces spikes caused by retries
For high volume tenants: Separate queues
Dedicated workers so their traffic doesn’t impact everyone else.
Welcome @Decoure_Ryan to our community! I’m Jay and I am a n8n verified creator.
The most practical n8n-specific approach here is to use a Code node before each external API call to check and update a counter per tenant. If you’re using Redis, store the key as rate_limit:{tenantId} with an expiry window matching the API’s reset period - increment on each request, and if the tenant is over limit, route to a Wait node instead of proceeding.
For the retry burst problem: instead of n8n’s built-in retry (which fires immediately and can amplify 429s), catch error output from the HTTP Request node and route it to a Wait node with a fixed or dynamic delay pulled from the Retry-After response header, then loop back to the request.
If you’re on queue mode, per-workflow concurrency limits give you natural per-tenant isolation when each tenant’s jobs run in a dedicated sub-workflow - set concurrency: 1 per tenant workflow and the queue handles the throttling for you without any Redis counter logic.
Here are some things that have worked at scale:
The per-tenant queues are the right call. Shared queues with throttling sounds like it would be simple but seems to always end up with noisy-neighbor issues. An isolated queue gives you real isolation without the complex priority logic.
As for token bucket implementation, Redis is solid, just store a key per tenant using TTL-based refill Retries don’t all fire at the same time after the rate limit window resets use exponential backoff with jitter.
As for worker allocation, Consider a tiered model, a small pool of dedicated workers for the highest-volume tenants, shared workers for everyone else. This will avoid over-provisioning while still protecting the top tenants throughput.
Something that often gets overlooked is instrument per-tenant queue depth and wait time, not just the rate limit hits. This is where you will usually locate the real bottleneck before there is an SLA issue.
What type of external APIs are you hitting? Some have burst allowances that can soften the retry problem considerably.
Per-tenant token bucket in Redis is the right instinct, and the key design choice is to enforce the limit before the job reaches the worker, not inside it. If the worker pulls a job and then waits on a rate limit, you have tied up worker capacity doing nothing, which is exactly your “one busy tenant eats everyone’s capacity” problem.
So the shape is: webhook to queue, then a gate that checks the tenant’s bucket in Redis and only releases the job to a worker when that tenant has budget, otherwise it re-queues with a delay. That keeps a throttled tenant’s backlog from blocking others.
On the retry bursts: your retries need per-tenant backoff too, not a global one, otherwise a tenant that is already at its limit retries into the same wall and amplifies the burst. Exponential backoff with jitter, counted against the same bucket.
For real fairness at hundreds of tenants, separate queues for your few highest-volume tenants and a shared queue for the long tail is usually the pragmatic split. Full per-tenant queue isolation is cleaner but a lot more to operate. One more: put a check on per-tenant queue depth so when one starts backing up past a threshold you find out early, instead of discovering it when that tenant complains their jobs are hours behind.