Run n8n on stateless containers

I have been trying to deploy n8n on stateless containers (Google Cloud Run in particular) over the last days, but without much success so far. (There is also an issue on GitHub for it: https://github.com/n8n-io/n8n-docs/issues/609, but I was forwarded to the forum)

The setup looks like this:

  • Postgres database on Cloud SQL
  • Set the N8N_ENCRYPTION_KEY variable to make the en-/ decryption stateless
  • Basic Auth

If I run this setup with exactly one container, it works flawlessly, but as soon as I allow Cloud Run to scale the number of instances, I get following problems:

  • Test mode doesn’t work reliably, because requests might be routed to another container than the one serving the website which is “waiting to receive request”. This problem, however, would be tolerable.
  • Of course cron jobs don’t work, since all containers are terminated when no more requests are incoming.
  • This error often occurs:
    Error: {"code":404,"message":"The requested webhook \"POST a32…\" is not registered.
    Maybe some problem with the activation of newly spun up instances?

So there must be some kind of problem when several n8n instances share the same Postgres database? If so, I think this shouldn’t be a problem. All they should do with the database is load the workflows and write the logs of the events. This itself shouldn’t cause any interference between the n8n instances.

As @krynble says in the GitHub issue:

Now if your workflows are based only on webhooks (i.e. external http requests) then you should have no problem with multiple n8n instances sharing the same database. This is the only situation when you can say that n8n is “stateless”.

So why doesn’t it work then? :sweat_smile:

All right, let’s continue!

Ok, now I know what is happening.

If you have multiple instances, whenever any of them is shut down it will de-register the webhook endpoint.

In order to avoid this you must set N8N_SKIP_WEBHOOK_DEREGISTRATION_SHUTDOWN=true as an environment variable. You can see the description for this flag here: https://github.com/n8n-io/n8n/blob/1f71e69ed881142c417b6e12533783ac24cc2e45/packages/cli/config/index.ts#L516

Still I would recommend you use only 1 persistent n8n instance (started via npm run start or n8n start) and multiple webhook and worker processes as our documentation suggests.

1 Like

In order to avoid this you must set N8N_SKIP_WEBHOOK_DEREGISTRATION_SHUTDOWN=true as an environment variable.

Awesome, that’s what I’m talking about :raised_hands::grin:. I’ll give it a try, thanks!

Still I would recommend you use only 1 persistent n8n instance

I’m especially interested in this setup to be able to scale down to 0. Since then I could run n8n basically for free if there were only a few webhook calls per day. (And yet still be able to scale up to infinity … or at least to the the limits of the Postgres database)

I didn’t think n8n was officially supported on Stateless Containers :thinking:

1 Like

I see, that is a very interesting approach.

Please bear in mind that you cannot have workflows triggered with anything other than http requests in this case, otherwise you could have duplication of work, as I mentioned before.

Other than this, you should be on the safe side =)

Good luck and let us know of your results!

Hey @jon it is not officially supported, but with some hacks it is possible.

There are very strong constraints to this kind of usage, this is why we do not officially support as it can cause some problems (mainly duplication of work) so we prefer to use the “official scaling methods”.

For webhook calls, with the above mentioned flag you can have n8n as nearly stateless.

1 Like

That is interesting to know, I am not sure if I will ever have that requirement as I prefer the idea of always having a main instance running.

1 Like