Hi guys,
I’m scaling n8n horizontally with multiple workers/instances, and I’m starting to run into problems around shared workflow state and coordination.
Current setup is roughly:Load Balancer
↓
Multiple n8n Workers
↓
Shared Database/Queue
The issue is that some workflows depend on shared state or sequential processing, and things get tricky when executions land on different workers.
Problems I’m seeing:
• Duplicate processing during retries
• Race conditions between workers
• Inconsistent execution state
• Difficulty coordinating long-running workflows
Some workflows also depend on temporary state like:{
“job_id”: “123”,
“current_step”: 4,
“status”: “processing”
}
Right now I’m debating between:
• Centralized state storage (Redis/DB)
• Workflow-level locking
• Queue partitioning
• Event-driven/state-machine architecture
For teams running n8n at scale across multiple workers:
• How are you managing distributed workflow state safely?
• Are you relying mostly on DB transactions/locks, Redis, queues, or something else?
• Any patterns that helped avoid race conditions and duplicate processing?
Describe the problem/error/question
What is the error message (if any)?
Please share your workflow
(Select the nodes on your canvas and use the keyboard shortcuts CMD+C/CTRL+C and CMD+V/CTRL+V to copy and paste the workflow.)
Hey there @Selena_Gloria Once you scale to multiple n8n workers, the biggest rule is:Do not rely on in-memory workflow state because executions can move between workers at any time.
Common production approach is to Use a central shared state store like:
• Redis
• PostgreSQL
• Queue system
to track things like:{
“job_id”: “123”,
“status”: “processing”,
“current_step”: 4
}
What usually works best
Store workflow state centrally
Use DB locks or unique constraints for deduplication
Use queues for sequential/rate-limited processing
Keep workers as stateless as possible
For long-running workflows
Many teams break them into:
Smaller event-driven stages
Separate queue jobs per step
Describe your architecture and ENVs.
n8n in queue mode already stores the state centrally in Redis and performs executions locking on the fly. Unless you have messed with the worker timeouts.
About the temporary state - move it to the n8n Data Table
That actually helps clarify a lot, thanks. I am using queue mode, so I may need to review whether I accidentally changed worker timeout/visibility settings during scaling.
Good point about the temporary state too I was debating between Redis and DB storage, but moving execution-related state into the n8n Data Table sounds much cleaner for consistency across workers.
This is really helpful, thanks. I think my main mistake was still treating some workflow state as “local” to the worker instead of assuming executions can move between instances at any time. The point about keeping workers stateless and storing execution state centrally makes a lot of sense, especially for retries and long-running workflows. I’m also starting to think breaking some of the larger flows into smaller queue-driven stages might simplify a lot of the coordination issues I’m seeing.