408 webhook timeout at a 2-node uptime-check workflow

rpock · February 12, 2024, 8:26am

Describe the problem/error/question

Hello everyone

My self-hosted N8N instance is on the following server:
32GB RAM, 32 vCPU, 100GB disk space

So it should be more than enough to run the N8N backend properly with 3 workers and 3 webhook instances each.

I check the uptime of the server with an external service and a simple workflow with a webhook node and a webhook response node is executed every minute. The external service reads the JSON response when the workflow has been executed successfully.

Unfortunately, it happens several times a day that there is a timeout, although the 2-node workflow is the only thing that is executed at that time.

I initially thought it was an error of the external service, but this is not the case. When a timeout is reported, the workflow is not accessible. The webhook URL itself is reachable normally (check call without webhook-id, so only $server/webhook).

In the log files, everything looks as if it is being executed normally. In the following example, there was a downtime from 08:12.06 to 08:15:01. The log file shows nothing unusual, but no webhook was reachable during this time.

What is the error message (if any)?

408 - Webhook Timeout

It continues as you can see here …

Please share your workflow

Share the output returned by the last node

Information on your n8n setup

n8n version: 1.27.2
Database (default: SQLite): Postgres
n8n EXECUTIONS_MODE setting (default: own, main): queue
Running n8n via (Docker, npm, n8n cloud, desktop app): docker
Operating system: hosted @ railway.io

EmeraldHerald · February 13, 2024, 2:16pm

Hi there @rpock Thanks for all the information you provided!

That 408 doesn’t look like it’s coming from n8n - a couple of questions for you:

Are you using a reverse proxy, or anything else that might be sitting between n8n and what you’re sending the requests from?
Does your monitoring tool have a very low timeout setting?

rpock · February 13, 2024, 2:40pm

HI @EmeraldHerald

Thank you for reply!

408 is only the response code for the timeout, so it’s not from n8n because the monitoring tool get no response within the time period set for timeout.

no, directly to n8n webhook
the timeout from the monitoring tool is 30 seconds, so it should be enough time to get the response from this tiny uptime-check

I’ve already asked the railway support engineers to check if there’s an issue on their side. Here’s the response:

I do not see any indication of network issues at the times you’ve detailed above. Based on this info, I suspect this is an issue at the project-level, perhaps an n8n issue.

Yesterday I got massive downtimes (maybe because I added the monitoring from US side for the uptime check, so there were two checks per Minute). This massive downtimes were persistent, also when I removed the check from US side. Then I restarted the webhook and the worker containers, after that the problem was gone.

Today the same issue as always, some times the uptime-check is not reachable (at this time today, only one time happened).

By the way, I’m sure the monitoring tool is working properly because I check the uptime check URL when I get an outage message and the uptime check loads endlessly and then shows no page.

EmeraldHerald · February 13, 2024, 2:48pm

Hi @rpock - Hmm Do you have any example setup where the problem could be reproduced outside of Railway so we could dig into this a bit more?

rpock · February 28, 2024, 11:31am

First of all, thank you for your efforts, the support here is always great

I was able to locate the error. Briefly the facts and the basic problem that goes with it:

My N8N server also processes webhook events from a WhatsApp Business provider, among other things. The provider has various events that can be subscribed to in order to trigger the webhook.

So far so good. However, broadcast messages are also sent to the webhook, which of course are of little interest to me, but this cannot be switched off. (For those who don’t know, broadcast messages are those WhatsApp stories that you inevitably receive when existing contacts post a WhatsApp story)

The basic problem with broadcast messages is that they contain image or video files that may be relatively large (in the range of several MB, for example).

This is where N8N seems to reach its limits or I have not found a way around this. The processing of such a webhook, which contains a large data package, sometimes takes 2-3 minutes, even if it is categorized right at the beginning of the workflow that broadcast messages should not be processed further.

Now it is the case that these broadcasts are received several times at the same time (the broadcast message is also transmitted in full for different read states such as received, delivered, … ), which means that several processes are executed at the same time, sometimes for several minutes, thus blocking the execution of tiny uptime workflows accordingly. This causes the server response to time out because everything is busy.

My Workaround
My workaround for this is that I use my own Rest API endpoint, which serves as a webhook trigger for the WhatsApp Business Provider. I analyze in this endpoint whether it is a regular message and then forward the entire JSON payload to my N8N webhook. This way I can easily filter out broadcasts.

N8N problem
The processing of large payloads sent to an N8N webhook seems to take a very long time and may paralyze the N8N server or the respective instance if this gets out of hand.

In theory, it is possible to paralyze any N8N server instance by sending a few large data packets to the webhook. External interference can probably be prevented by authentication or other mechanisms, but authentication cannot be implemented everywhere.

system · March 11, 2024, 11:03pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.