I’ve been considering how to have highly available self-hosted version of n8n with AWS that can take advantage of its autoscaling while also still abstracting all of this from our non-devs.
I’ve considered a few things like having API Gateway handle a greedy route of /webhook/* and forward it to SQS to be handled, but this feels useless since the code to add a job to the redis queue isn’t exposed via some API, so I’d need to go review the internal code and write a lambda duplicating it.
Before I go through a large list of design patterns trying to fit into the suggestions at in the scaling docs, it’d be great to learn if someone else has already established a good pattern for the webhook workers pool, workers pool, and main host.
I suppose using ECS with load balancing for everything might make more sense and be the least effort, but would love to hear what people have done.
If there is any documentation or tutorials for any service where someone has implemented all of the best practices outlined in n8n’s advanced scaling section, that’d also be super helpful.
I keep searching and all of the examples I’m finding are for the most basic implementation.
If nothing pans out, I think API Gateway, ECS, Elasticache, and the correct loadbalancing config should be enough. It’s always nice to see what others have done though.
Your question is super interesting. I have on my to-do list a benchmark to really crunch the numbers of how n8n scales. Also as part of this task comes the documentation process.
You got the overall idea, but I’m not sure about elasticache. API gateway is also not mandatory as a simple application load balancer should be enough.
My recommendation is that you set up ECS with 3 different tasks and a few services:
Postgres 13+ as a shared database for all your workflows
A Redis cluster that will be also used by all your n8n instances
1 task running the n8n default process in queue mode and webhooks disabled. This will cause this "main’ process to run forever and getting restarted if necessary. This one should not have replication i.e. should be a single instance of n8n
1 task running worker processes that can have multiple instances on the same machine. Each worker uses 1 process therefore running multiple instances helps you better use your resources that scale based on CPU / network
1 task running webhook processes that can also have multiple copies in the same host. Just like the workers, these are single processes so running multiple instances helps better use your resources
So what happens in the end is that your main n8n instance will be responsible for triggering workflows that are not webhook based, like crons, polls, etc. Everything else will be run on workers or webhook nodes. For this reason, this instance cannot and should not be scaled as this would cause duplication of work.
Your main instance is also your entry point to n8n, allowing you to edit workflows, view executions, etc.
If you have any more questions, feel free to add. I believe this community post can become a great place for a future documentation on deployment practices as your questions will help build this document.
Hey @krynble. I appreciate the thorough answer. My team is still evaluating a move from Zapier, primarily because it’s difficult to monitor and we’ve run into a lot of unexpected issues.
I’ll definitely report back if we go through with the implementation so that I can share any learnings we have.
Hey @krynble I’d love to implement your proposed solution, but I have zero knowledge in AWS ECS. I’ve been reading about it lately, but would love some pointers on what to focus on… Fargate vs EC2? Should I worry about load balancing from the beginning, or can it be set up later? How about networking, any consideration about using the default or create a new VPC + subnets?
If you have any indication of tutorials you’ve used, that would also be appreciated
I guess using Fargate is easier than EC2 since it’s less stuff for you to manage but maybe it’s slightly more expensive.
About load balancing I think it’s worth exploring it from the start otherwise the port mapping will be a mess. I strongly suggest you quickly hack a prototype, ECS setup is very user friendly and explains step by step how to set it up properly.
In regards to networking, the default settings should be ok to start. You can then later migrate your instances around (but this will probably cause some downtime).
@krynble We are trying to do something similar.
We are planning to buy Self-hosting Enterprise License. Can you look into the above logical architecture and let me know how can we place the multi-main setup here.
I am not able to find any documentations related to this. How multi-main setup of n8n will work in ECS fargate setup and how do we have to scale that way.
Unfortunately, I haven’t deployed n8n in an AWS environment for a long time, and I have not used AWS Fargate either.
If you’re unsure, I think your best option would be to discuss this with your n8n sales contact (who can liaise with our solution engineering team if necessary).