N8n - workflows getting unpublished / returning 404 randomly

lemining · September 2, 2021, 11:52pm

Hi all,

I’m running n8n on CloudRun (GCP) - it is a very simple setup with processing set to main (all happening on the same thread) with PostgreSQL for state/persistence.

I have created an example of my setup here - GitHub - luke-lewandowski/n8n-cloudrun-example: An example of deployment of n8n on CloudRun - Google Cloud Platform.

So in a nutshell the setup for those who aren’t familiar with CloudRun product - I give it a docker to run with a port to expose. As long as there are requests coming through - the docker stays alive. No big deal if it dies anyway as I assume that’s what the Postgres is for. However, I’m trying to keep it alive so that I don’t have to wait for the warm-up. I’m using the latest image of n8n from docker hub.

What the issue is that after a while my webhooks even though they are enabled in n8n start to return 404.
This is an example of my webhook twilio/received when I get SMS. The workflow is enabled when I look at the UI - but when I trigger it via Postman it throws 404.

eg.

{
    "code": 404,
    "message": "The requested webhook \"POST twilio/received\" is not registered.",
    "hint": "The workflow must be active for a production URL to run successfully. You can activate the workflow using the toggle in the top-right of the editor. Note that unlike test URL calls, production URL calls aren't shown on the canvas (only in the executions list)"
}

I’m not sure why this could be. It seems to be more apparent when I re-deploy (ie. a new docker gets created). I’m not sharing any files/folders between docker instances as CloudRun is fully stateless so I can’t be sharing eg. the .n8n folder. Am I missing something here?

harshil1712 · September 3, 2021, 7:08am

Hey @lemining!

Quick question: Is this workflow active? What Webhook URL are you using - Test URL or the Production URL?

lemining · September 4, 2021, 7:12am

@harshil1712 - The workflow is active. Webhook URL I’m using is production.

jan · September 4, 2021, 8:08am

n8n is not the right tool to run it on something like CloudRun. It expects a constantly running server. I guess what happened here, is that n8n got stopped at one point and did so remove all the entries from the webhook database table (as it is supposed to). You can fix that by deactivating and reactivating the workflows. To avoid that problem in the future, you can also set the environment variable N8N_SKIP_WEBHOOK_DEREGISTRATION_SHUTDOWN to true. That will make sure that the tables does not get cleared and external webhooks do not get unregistered.

Anyway, be aware that we do discourage running n8n on such platforms, for that reason do we not support it and can also not offer any help if problems arise.

lemining · September 5, 2021, 7:45am

Hi @jan ,

Thanks for your reply!

That’s very interesting. I have some questions about that behaviour.

Does that mean that every time the server is shutdown and restored (eg. for upgrade) - that you have to go through and re-activate all workflows? Also, is there a documentation anywhere on how to host n8n in a scalable fashion - I saw some posts on the community forum I was however unable to find any concrete documentation that would back it up and show any examples (even if it’s on AWS or any other cloud providers).

Thanks,
Luke

lemining · September 5, 2021, 7:54am

From here - Scaling n8n | Docs

My understanding is that could

Create a main n8n (purely for managing UI/workflows)
Workers (nodes that get created and killed per request)
Queue is managed in Redis
Overall state is managed in PostgreSQL

This is obviously without having to dig into code or anything and just browsing through community threads and documentation.

jan · September 5, 2021, 7:54am

No it is not necessary to recreate them after an upgrade or restart. n8n does all of that perfectly fine automatically. It is only a problem if n8n runs in a not supported way like here esspecially where multiple instances my overlap.

You can find the scaling documentation in the documentation here:

It does not cover AWS or another other cloud providers but explains how it is generally possible.

jan · September 5, 2021, 7:56am

More or less correct. But you have to make sure that (1-main) is always active and running (2-worker) also or you program some kind of logic yourself that it spins on up after a request gets received (which will obviously have huge performance implications).

lemining · September 5, 2021, 8:00am

@jan In my setup I just have a docker that runs as main and I keep it up pretty much up all the time (through scheduled task that keeps loading a workflow to keep it alive).

Obviously the docker will go down when I deploy updates.

I’m still not quite understanding why stopping the docker instance would make my workflows unpublished thought?

jan · September 5, 2021, 9:27am

Sorry, can not be of any more help here. All that is important in that matter has been mentioned already:

Is n8n currently supposed to unregister them on shutdown and registers them again on startup
If always just one instance runs, it will all work perfectly fine. It does not seem to be the case here because else you would not have this problem. There must be some kind of overlap that the one starts and the other one shuts down afterward and so unregisters them.
I advice you to run n8n on a virtual server (they are available for $5 a month) because that is the only thing we currently support
I offered a possible solution for your problem with overlap by setting: N8N_SKIP_WEBHOOK_DEREGISTRATION_SHUTDOWN=true

lemining · September 5, 2021, 10:23am

No need to be sorry, you are super helpful!

Makes sense
The only thing I can think of that would suit this case is that when deployment happens - I end up with two instances running while the load balancer changes traffic over to the new one.
$5 sounds super attractive - would you have a link?
I will deploy this solution first to see if it helps before I investigate others.

Thanks, heaps for your help!

jan · September 5, 2021, 10:30am

Yes, if that is the case then this will be the problem. What I mentioned under (4) should then help to solve it.
The most widely known one is Digital Ocean. Another one which offers instances with in my experience better performance is Hetzner. They had previously also instances for even 3 Euros but raised the prices slightly because the increased cost of IP addresses.

Jon · September 5, 2021, 12:45pm

Just want to throw Kimsufi / OVH into the mix. The kimsufi range is dirt cheap for a dedicated server that runs n8n well.