Set the N8N_ENCRYPTION_KEY variable to make the en-/ decryption stateless
Basic Auth
If I run this setup with exactly one container, it works flawlessly, but as soon as I allow Cloud Run to scale the number of instances, I get following problems:
Test mode doesn’t work reliably, because requests might be routed to another container than the one serving the website which is “waiting to receive request”. This problem, however, would be tolerable.
Of course cron jobs don’t work, since all containers are terminated when no more requests are incoming.
This error often occurs: Error: {"code":404,"message":"The requested webhook \"POST a32…\" is not registered.
Maybe some problem with the activation of newly spun up instances?
So there must be some kind of problem when several n8n instances share the same Postgres database? If so, I think this shouldn’t be a problem. All they should do with the database is load the workflows and write the logs of the events. This itself shouldn’t cause any interference between the n8n instances.
Now if your workflows are based only on webhooks (i.e. external http requests) then you should have no problem with multiple n8n instances sharing the same database. This is the only situation when you can say that n8n is “stateless”.
Still I would recommend you use only 1 persistent n8n instance (started via npm run start or n8n start) and multiple webhook and worker processes as our documentation suggests.
In order to avoid this you must set N8N_SKIP_WEBHOOK_DEREGISTRATION_SHUTDOWN=true as an environment variable.
Awesome, that’s what I’m talking about . I’ll give it a try, thanks!
Still I would recommend you use only 1 persistent n8n instance
I’m especially interested in this setup to be able to scale down to 0. Since then I could run n8n basically for free if there were only a few webhook calls per day. (And yet still be able to scale up to infinity … or at least to the the limits of the Postgres database)
Please bear in mind that you cannot have workflows triggered with anything other than http requests in this case, otherwise you could have duplication of work, as I mentioned before.
Other than this, you should be on the safe side =)
Hey @Jon it is not officially supported, but with some hacks it is possible.
There are very strong constraints to this kind of usage, this is why we do not officially support as it can cause some problems (mainly duplication of work) so we prefer to use the “official scaling methods”.
For webhook calls, with the above mentioned flag you can have n8n as nearly stateless.
Hey @krynble, I am using Cloud Run stateless containers with op setup. Even after using N8N_SKIP_WEBHOOK_DEREGISTRATION_SHUTDOWN=true I still have op error:
This error often occurs: Error: {"code":404,"message":"The requested webhook \"POST a32…\" is not registered .
I also keep one instance running in one VM. The error still happening. I don’t know if @ad-si has solved this issue.
I read official docs and watched your youtube tutorial on scaling n8n. Unfortunately, Cloud Run only trigger based on HTTP request. So I can’t set up worker in queue mode.
Let me know if there is any other information that you need me to clarify.
What happens under the hood on n8n is the following:
Once you activate a workflow that contains any webhook, a new entry is created in a database table called webhook_entity (add prefix if you make use of this feature)
Whenever any http request with /webhook/* is received, it gets parsed and checked against the above mentioned database to check what workflow it belongs to
If no entries in the webhook_entity table is found then you get the The requestd webhook ... is not registered
On a side note: the webhook_entity record is removed when n8n is shut down for any reason, therefore we set N8N_SKIP_WEBHOOK_DEREGISTRATION_SHUTDOWN=true so that it remains there, and also when you disable the workflow manually.
So if your workflow worked for a while and then stopped working, one of these conditions happened:
Your instance responding to the http request is not connected to the same database (maybe environment variables are wrong and it’s connecting to another database, perhaps even using local sqlite?)
N8N_SKIP_WEBHOOK_DEREGISTRATION_SHUTDOWN=true is not set and n8n was disabled in one or some of the instances, and when they shut down, one of them removes the record from the database
For the second case, you can enable logging (instructions here: Logging in n8n | Docs) and set it’s value to debug or at least verbose and watch for a line saying Call to remove all active workflows received (removeAll). If this ever happens, it means that n8n is somehow not following the N8N_SKIP_WEBHOOK_DEREGISTRATION_SHUTDOWN flag and we should investigate.
Lastly, I would like to clarify one question: Do your workflows eventually start working again with no manual intervention or do they stop and only work again if you manually reactivate?
I’m also still having issues, but I can’t really boil it down. By now I’m running a single container with the N8N_SKIP_WEBHOOK_DEREGISTRATION_SHUTDOWN=true flag and I have a monitor which runs a dummy workflow every 5 minutes. The monitor works flawlessly, but I still have workflow executions displayed as Unknown and get not registered warnings every now and then.
(Cloud run may restart container every now and then, but there seems no correlation between container restarts and stated problems).
Looks a log like a bug in n8n to me
Definitely, I will try running a similar setup and see how it works and if I can find a reason. As far as we tested with multiple deployment types, it was all working fine.
I will ask you if possible to enable loging on n8n and set the logging level to verbose or debug.
This will help us identify if and maybe when the workflows are being deactivated.
@ad-si I did use cloud scheduler to ping webhook every minute (idle instance, warm starts). But I experience not working when spiking many containers, and it just ping randomly.
@krynble Okay, I will turn on the debug level and will let you know.
My initial suspicion is gone when you mentioned 2)
Forgive my ignorance. Is it possible that n8n respond to the webhook once the container is ready before checking any webhook entity in the database? (something like race condition)
Regarding your last question, about 40-60% start working again with no intervention. 100% working again after I manually reactivate.
n8n only starts accepting http requests after the database connection has been initialized. This can be seen here where on line 191 we make sure the DB has been initialized and then on line 307 we start accepting http requests.
Thanks @krynble, for your speedy fix! I see the PR. Looking forward to when it’s merged.
Just to let you know that I am also trying a queue mode. Your video tutorial also helped me to set up n8n in Kubernetes. One pod still needs to be persistent (hence some cost I want to avoid), but I am pretty happy with it. Thank you!
Can sadly not help with your question but just to make it clear again. n8n does not officially support that and you will run into issues and those are expected. Meaning if you run into any issues you are pretty much on your own.
So only do that, if you know exactly what you are doing. If you want that n8n runs properly, run into on a virtual machine. Those are available for $5 per month from services like DigitalOcean, Hetzner, and others.