K8s n8n installation best practice/best method

Hey,

I just installed n8n very basically on my Openshift 4.11 cluster and all is well. We have created some processes that work for our business. My question is mainly related to documentation I suppose:

Where is (or should there be) an official documentation on how this works on K8s? The documentation for instance mentions that you do not need persistent storage if the encryption key is in the ENV and using Postgres (which I am), and I would prefer to avoid persistent storage. So why have the option at all if it is not needed? What am I missing out on?

Are there any gotchas with sessions or scaling or when doing a rollout from one set of pods to a new set of pods?

I assume that websocket based workflows will run fine when scaled, but not cron as there would need to be some sort of “cluster” to know which instance is going to run the cron. There is queue-mode but this still bottlenecks everything down to one pod. I regularly need to update the cluster, and I rely on our applications to be able to seamlessly survive pod destruction.

I did find this repo GitHub - 8gears/n8n-helm-chart: A Kubernetes Helm chart for n8n a Workflow Automation Tool. Easily automate tasks across different services. but the last commit is over a year ago. I can already see this is before the days of having proper authentication implemented in n8n. I also found this from another thread which again references the basic auth, and is heavily opinionated with traefik which I do not use, and does not seem to allude to any scalability.

I would be willing to help with an effort to get this working, particularly in a production-friendly manner like using all non-root containers and perhaps with an operator or new Helm chart. This app is a gem and I would love to support the development.

At the very least I would like to understand how best I should set up this cluster for my use case and faithfully document that so that the next searcher has a thread to go to.

1 Like

Hi @Mark_Amber, welcome to the community!

We don’t have an official best practice guide for k8s yet, but I believe this is in the works by @deborah.

On a general level we describe options for scaling here: Configuring queue mode - n8n Documentation

@deborah any way I can assist on this?

I would like to move to a more secure setup so I’m going to have to build my own n8n container to be more secure (and thus Openshift / GKE autopilot compatible) and I think I could be helpful in testing out any draft documentation.

I’m not sure I completely understand the n8n scaling of instances yet but I eventually get it with persistence. I think there is a fair amount of documentation that references outdated concepts for simple things like setting environment variables (like the old authentication model for instance) that may be a stumbling block. And I would like to know where is the place to raise issues with the docs.

Hi @Mark_Amber, thanks for the offer! :tada:

Where to raise docs issues: please add issues to the board in the GitHub repo Issues · n8n-io/n8n-docs · GitHub The repo also contains general contributor info in the README and wiki. Pull requests are welcome.

For k8s specifically: we currently have a contractor working on three deployment tutorials involving k8s (GCP, AWS, Azure). A first draft of GCP can be viewed here: Draft GCP hosting guide by ChrisChinchilla · Pull Request #892 · n8n-io/n8n-docs · GitHub (I’ll be testing it this week - but if you want to dive in and test, please go ahead, just be aware it’s very much at “draft” stage)

I think this is the start of the docs @deborah ?

As it happens I was able to get n8n setup on GKE yesterday. That was on standard, not autopilot. Not that I “like” autopilot but between autopilot and Openshift (which is what we run internally) it’s certainly tougher to get n8n to run because the docker container needs root.

I’m still a bit perplexed on the requirement for n8n to have a PV. If this linked google cloud doc is the draft you speak of I think it would be very good to try and get rid of that requirement. If you are mounting a PV that is being written to, it really needs to be a StatefulSet because k8s will startup new versions before closing old versions. Assuming the storage provider even supports mounting the storage in such a way… I run the n8n without persistent storage and have yet to determine what exactly I am “missing”.

The other thing I did yesterday was get the workers setup. I think in the future for n8n it would be good (and maybe you already have this in your cloud version) to use the redis cluster to lock a bunch of state so you can run multiple “main” (although you use the term “main” twice, once to describe the node js execution model and once to describe non-worker non-webhook instances). I’m not completely certain but it looks like scaling the “non-worker non-webhook” instance is possibly impossible which means you will have downtime during updates, because anything triggered by a wall clock will be fired multiple times. It looks like the multiple webook instances will take care of things when main boots back up but any workflow that relies on a wall clock will be repeated if it happens during an update which would be a problem if the workflow was not idempotent or didn’t fail early on. Which could be an undetectable big error. Like copying a deal in a CRM twice to show an extra large amount of revenue or something.

I’m curious how this is solved, but Even if it is partially solved in the code, your cloud solution, or even existing documentation, think it all comes back to using Postgres for scheduling clock based jobs and using redis to let mains determine who is going to be “primary” (based on knowing which active instance has priority to pick up that wall clock job) and fire them back into the redis for workers to pick them up.

I know that’s a lot but I want to see if there are any answers here or maybe I could help out on any aspect of the setup to avoid the single main process, and continue to avoid storage outside of the Postgres.

2 Likes

@deborah did you manage to test this guide?

We have been trying to follow other kubernetes implementation tutorials without success - I will ask our team to try this one- I imagine this issue is a really critical one to many n8n users! It would be great to have this out there asap.

Tks

1 Like

@Sergio_Spieler I should be testing the GCP k8s tutorial soon (hopefully this week)

The GCP tutorial is now live: Google Cloud - n8n Documentation
AWS should also be out soon, with Azure to follow. The GCP/AWS/Azure setups are all fairly similar.

1 Like

would love to collaborate - do you have some repo that I can use as a jumping-off point?

I was able to piece things together and have some templates. There is a redis operator I used and use Crunchy Postgres Operator to handle creating the DB, secret for the DB, and backups for the DB.

Crunchy is fairly easy to setup once you decide how to do backups and get the secret defined.

The Redis one is a bit annoying because you need to setup RBAC rules (despite the fact that this probably could be done without)

The n8n setup depends on if you want to separate the webhook and worker units apart and how your ingress works.

In my case on openshift they have the Route abstraction which handles routing the /webhooks to the right service

I know this topic is old but clearly you found it. Do you have any specific questions about your setup or my setup that I could help with? And whatever discourse we have here can for all to see.

It comes down to how you want it. Instead of just using a simple docker compose you have a need to go kubernetes. Ok why? For me- I embrace kubernetes because it allows me to handle all my infrastructure like cattle, manage one resilient cluster, on crappy cheap used servers. This provides my organization with extremely high throughput on our (almost completely) self hosted software for little more than the cost of the internet we already needed for the office. It allows us to architect our stuff extremely sloppily and consume a ton of resources. So my implementation is going to be way more heavy handed than someone trying to use GKE and minimize costs.

It’s way over engineered for the scale I need it to operate at but I have reached the level I desired of being able to completely lose one of my three hosts at any time and not really mess up anyone’s workflows.

The problem with Helm / kustomize from vendors is that it ignores those things like DB backup, the fact you might want your ingress to handle certificates, etc etc. I find now over the last year of working with kubernetes you really get to know it and know how to stand up what you need. Maybe a blog post showing the architecture required would be optimal.

1 Like

Thanks for your detailed breakdown! I’ve also managed to put together a decent-ish deployment with helm and queue mode with redis and some fallback nodes, etc. Handling stuff like DB migrations, etc. was exactly next on my todos - thanks for sharing those resources!

I’m pretty new to kubernetes (been messing with it for ~a week), so let’s see how it goes. My requirements aren’t quite as hardcore as yours for sure, but I don still want my n8n deployment and all its bells and whistles to run smoothly and be manageable.

Although you are self-hosting everything, do you have any thoughts on where/how I should go about hosting the entire stack? Currently I’m running an external AWS RDS and packaged redis, n8n and its worker config into a helm chart (this chart I’m then deploying to AWS EKS). Would appreciate your advice - thanks!

PS: would pay to read that blogpost! :pray:

I have created a helm chart for scalable n8n, take a look at GitHub - a5r0n/n8n-chart: Helm Chart for n8n.io