K8s n8n installation best practice/best method


I just installed n8n very basically on my Openshift 4.11 cluster and all is well. We have created some processes that work for our business. My question is mainly related to documentation I suppose:

Where is (or should there be) an official documentation on how this works on K8s? The documentation for instance mentions that you do not need persistent storage if the encryption key is in the ENV and using Postgres (which I am), and I would prefer to avoid persistent storage. So why have the option at all if it is not needed? What am I missing out on?

Are there any gotchas with sessions or scaling or when doing a rollout from one set of pods to a new set of pods?

I assume that websocket based workflows will run fine when scaled, but not cron as there would need to be some sort of “cluster” to know which instance is going to run the cron. There is queue-mode but this still bottlenecks everything down to one pod. I regularly need to update the cluster, and I rely on our applications to be able to seamlessly survive pod destruction.

I did find this repo GitHub - 8gears/n8n-helm-chart: A Kubernetes Helm chart for n8n a Workflow Automation Tool. Easily automate tasks across different services. but the last commit is over a year ago. I can already see this is before the days of having proper authentication implemented in n8n. I also found this from another thread which again references the basic auth, and is heavily opinionated with traefik which I do not use, and does not seem to allude to any scalability.

I would be willing to help with an effort to get this working, particularly in a production-friendly manner like using all non-root containers and perhaps with an operator or new Helm chart. This app is a gem and I would love to support the development.

At the very least I would like to understand how best I should set up this cluster for my use case and faithfully document that so that the next searcher has a thread to go to.

Hi @Mark_Amber, welcome to the community!

We don’t have an official best practice guide for k8s yet, but I believe this is in the works by @deborah.

On a general level we describe options for scaling here: Configuring queue mode - n8n Documentation

@deborah any way I can assist on this?

I would like to move to a more secure setup so I’m going to have to build my own n8n container to be more secure (and thus Openshift / GKE autopilot compatible) and I think I could be helpful in testing out any draft documentation.

I’m not sure I completely understand the n8n scaling of instances yet but I eventually get it with persistence. I think there is a fair amount of documentation that references outdated concepts for simple things like setting environment variables (like the old authentication model for instance) that may be a stumbling block. And I would like to know where is the place to raise issues with the docs.

Hi @Mark_Amber, thanks for the offer! :tada:

Where to raise docs issues: please add issues to the board in the GitHub repo Issues · n8n-io/n8n-docs · GitHub The repo also contains general contributor info in the README and wiki. Pull requests are welcome.

For k8s specifically: we currently have a contractor working on three deployment tutorials involving k8s (GCP, AWS, Azure). A first draft of GCP can be viewed here: Draft GCP hosting guide by ChrisChinchilla · Pull Request #892 · n8n-io/n8n-docs · GitHub (I’ll be testing it this week - but if you want to dive in and test, please go ahead, just be aware it’s very much at “draft” stage)

I think this is the start of the docs @deborah ?

As it happens I was able to get n8n setup on GKE yesterday. That was on standard, not autopilot. Not that I “like” autopilot but between autopilot and Openshift (which is what we run internally) it’s certainly tougher to get n8n to run because the docker container needs root.

I’m still a bit perplexed on the requirement for n8n to have a PV. If this linked google cloud doc is the draft you speak of I think it would be very good to try and get rid of that requirement. If you are mounting a PV that is being written to, it really needs to be a StatefulSet because k8s will startup new versions before closing old versions. Assuming the storage provider even supports mounting the storage in such a way… I run the n8n without persistent storage and have yet to determine what exactly I am “missing”.

The other thing I did yesterday was get the workers setup. I think in the future for n8n it would be good (and maybe you already have this in your cloud version) to use the redis cluster to lock a bunch of state so you can run multiple “main” (although you use the term “main” twice, once to describe the node js execution model and once to describe non-worker non-webhook instances). I’m not completely certain but it looks like scaling the “non-worker non-webhook” instance is possibly impossible which means you will have downtime during updates, because anything triggered by a wall clock will be fired multiple times. It looks like the multiple webook instances will take care of things when main boots back up but any workflow that relies on a wall clock will be repeated if it happens during an update which would be a problem if the workflow was not idempotent or didn’t fail early on. Which could be an undetectable big error. Like copying a deal in a CRM twice to show an extra large amount of revenue or something.

I’m curious how this is solved, but Even if it is partially solved in the code, your cloud solution, or even existing documentation, think it all comes back to using Postgres for scheduling clock based jobs and using redis to let mains determine who is going to be “primary” (based on knowing which active instance has priority to pick up that wall clock job) and fire them back into the redis for workers to pick them up.

I know that’s a lot but I want to see if there are any answers here or maybe I could help out on any aspect of the setup to avoid the single main process, and continue to avoid storage outside of the Postgres.

1 Like

@deborah did you manage to test this guide?

We have been trying to follow other kubernetes implementation tutorials without success - I will ask our team to try this one- I imagine this issue is a really critical one to many n8n users! It would be great to have this out there asap.


1 Like

@Sergio_Spieler I should be testing the GCP k8s tutorial soon (hopefully this week)

The GCP tutorial is now live: Google Cloud - n8n Documentation
AWS should also be out soon, with Azure to follow. The GCP/AWS/Azure setups are all fairly similar.

1 Like