Webhook return of large payload 'fails' in queue mode configuration for production, succeeds in test

As part of building an interim solution for a legacy integration, I’m implementing webhooks to act as wrappers around API calls to an internal system, mimicking the web service calls provided by an older system.

One of these is a file retrieval webhook i.e. it returns a binary document retrieved from an internal API call.

This works fine for files up to about 19MB in size. However, I’m finding that when the documents exceed this size, then although the flow execution succeeds (i.e there are no errors shown in the executions), the return from the webhook call is the simple message {“message”:“Workflow executed successfully”}

I’ve got to the point where I think its related to how n8n works in queued mode (or our deployment of n8n in queued mode), but cannot be 100% sure. My tests show (using version 0.220.0)

Standalone n8n single node, SQLLite DB

  • works in test mode
  • works in prod mode

Our Staging environments, which use a redis queue with one worker node, PostGreSQL Database

  • works in test mode
  • fails in prod mode (i.e the return payload from the webhook call is the 44 byte {“message”:“Workflow executed successfully”} string)

Note - I don’t have control over, or visibility of any underlying info in the Staging environment, so this is a guess, but when running a webhook flow in test mode, is it handled purely by the main instance i.e the redis queue and worker nodes don’t get a look-in?

An example workflow that shows the issue is below (I’ve replaced the internal API calls with an HTTP Get to a publicly available large file to illustrate the problem). A cURL command to the webhook-test version works; a cURL command to the webhook version “fails”

Hi @NickW, I am sorry you are having trouble.

This works fine for files up to about 19MB in size. However, I’m finding that when the documents exceed this size, then although the flow execution succeeds (i.e there are no errors shown in the executions), the return from the webhook call is the simple message {“message”:“Workflow executed successfully”}

I’ve got to the point where I think its related to how n8n works in queued mode (or our deployment of n8n in queued mode), but cannot be 100% sure. My tests show (using version 0.220.0)

Hm, I am not aware of any hard limit, but I can already see my local laptop struggling when running your workflow locally due to a lack of memory. This is even on the current 1.5.1 version of n8n (as n8n would process all binary data in memory and save execution data in the database when using queue mode).

Is there a chance you are not saving execution data when manually executing your workflow (which works), but are trying to store execution data for your production execution (triggering a large and possibly failing database transaction)?

Or do you have different amounts of memory available on your main instance compared to your worker instance?

when running a webhook flow in test mode, is it handled purely by the main instance i.e the redis queue and worker nodes don’t get a look-in?

Yes, a manual execution would take place on the main instance rather than sent to a worker.

Hi @MutedJam
Many thanks for the feedback and info.

In answer to your questions

Is there a chance you are not saving execution data when manually executing your workflow (which works), but are trying to store execution data for your production execution (triggering a large and possibly failing database transaction)?

I don’t think so - the settings show Yes for saving execution in both production and manual modes

Or do you have different amounts of memory available on your main instance compared to your worker instance?

Both are configured with 4GB memory.

Based on the feedback re manual executions on the main instance - the ops people kindly temporarily re-configured the DEV instance to switch off queuing i.e no worker instances and all flows processed only by the main instance in all cases.

With queuing switched off, the workflow ran successfully in both test and production modes - so now I’m pretty sure the issue is related to how queuing works, or how information is transferred between the instances in queueing mode.

Not quite sure what I can do here, but is there anything/anywhere I can get the devs to look to determine what might be going on?

Cheers,
Nick

Not quite sure what I can do here, but is there anything/anywhere I can get the devs to look to determine what might be going on?

I am afraid nothing I am aware of. Can you confirm if you’re also seeing this problem using the current version 1.5.1 of n8n and using the latest version of the HTTP Request node (simply remove the old HTTP Request node and add a new one from scratch after upgrading your n8n version)?

Hiya
unfortunately I can’t quickly get a 1.5.1 deployed (we have an internal engineering cycle that would require planning and resource commitments) to test this out in a queued mode. And I’m neither sufficiently tech savvy, nor have the infra to try and deploy a 1.5.1 with queueing myself.

Many thanks though, for the information and feedback - its been very helpful. I’ll look into other options for now, and add to this (or open a new question) if I can get to the point of trying out 1.5.1 and see a similar issue.

Best regards,
Nick

Hi @NickW, you’re most welcome!

Also, on a possibly related note, @BramKn and @pemontto recently discovered a massive performance decrease with the slow query logging enabled. Further investigation revealed that this also blows up the memory consumption a lot (as the parameters of each slow query are being logged, and the ORM we have in use applies additional logic, such as color-formatting).

This log type will soon be disabled by default. However, seeing that updating n8n is somewhat complicated for you, perhaps you also want to disable it manually on your instance (by setting the DB_LOGGING_MAX_EXECUTION_TIME=0 env variable) and verify if this improves the situation?

2 Likes

Many thanks indeed! I’ll pass that on straight away.

2 Likes