Important delay before webhooks triggering

Describe the problem/error/question

The delay between receiving the webhook and the workflow starting is too high: sometimes over 90 seconds. The webhook executions stay in “new” status for this long:

… while our N8N_CONCURRENCY_PRODUCTION_LIMIT = 75, there are only 27 executions in the screenshot above.

This issue is blocking us from working with applications that require quick responses.

Information on your n8n setup

    • n8n version: 1.115.3

    • Database (default: SQLite): using RDS Postgres and EFS for storing n8n data

    • Running n8n via (Docker, npm, n8n cloud, desktop app): self-hosted mode on aws with Fargate (docker)

hello @theo

Check the resource usage. n8n may be overwhelmed and thus respond slowly.

hello @barn4k

Thanks for your answer.

We’re always under 40% of CPU and Memory utilizations, so I guess it’s fine.

Any other idea?

Thank you

We’re now on the v1.121.2 and the problem remains.

Perhaps @Jon, @ivov or @Gallo_AIA you would have a clue?

Thanks

This is just a limit.

How many workers you have set and started?(each of them needs concurrency as well).

Your env file will be helpful to DEBUG first…

Thanks @Parintele_Damaskin here without the sensitive information:

{
  "name": "N8N_LOG_FILE_MAXSIZE",
  "value": "32"
},
{
  "name": "NODE_OPTIONS",
  "value": "--max-old-space-size=8192"
},
{
  "name": "N8N_SKIP_WEBHOOK_DEREGISTRATION_SHUTDOWN",
  "value": "true"
},
{
  "name": "EXECUTIONS_DATA_SAVE_MANUAL_EXECUTIONS",
  "value": "false"
},
{
  "name": "QUEUE_RECOVERY_INTERVAL",
  "value": "0"
},
{
  "name": "N8N_DISABLE_PRODUCTION_MAIN_PROCESS",
  "value": "false"
},
{
  "name": "EXECUTIONS_DATA_MAX_AGE",
  "value": "960"
},
{
  "name": "N8N_BASIC_AUTH_ACTIVE",
  "value": "true"
},
{
  "name": "GENERIC_TIMEZONE",
  "value": "Europe/Paris"
},
{
  "name": "EXECUTIONS_MODE",
  "value": "queue"
},
{
  "name": "QUEUE_BULL_REDIS_HOST",
  "value": "redacted"
},
{
  "name": "EXECUTIONS_DATA_PRUNE_MAX_COUNT",
  "value": "300000"
},
{
  "name": "N8N_CONCURRENCY_PRODUCTION_LIMIT",
  "value": "75"
}

Beside the

name": “EXECUTIONS_DATA_PRUNE_MAX_COUNT”,

“value”: “300000” and “name”: “EXECUTIONS_DATA_MAX_AGE”,

“value”: “960”

that maybe accumulate a large number of execution records, which can slow down performance if not properly managed variables looks OK.

Set as well

QUEUE_HEALTH_CHECK_ACTIVE=true then acacesas the endpoinmts:

http://your_host:port/healthz

http://your_host:posr/healthz/readiness

A best practice as well would be a load balancer for webhooks :

P.S I am curios how you setup to start your workers eg :

n8n worker --concurrency=5

Thank you @Parintele_Damaskin!

We did set QUEUE_HEALTH_CHECK_ACTIVE to true and increased the concurrency up to 75!

We saw a positive impact on the speed of executions, but we still had queues that were accumulating up to around 70 executions, always up to 2:30 minutes before executing everything within a second! Very strange behavior IMO.

I can understand EXECUTIONS_DATA_PRUNE_MAX_COUNT is quite high, but it corresponds to around 15 days of history for our volume :confused:

This issue is really annoying, we’re thinking of creating another specific instance dedicated to webhooks that need faster execution.

If you have any other thoughts and suggestions, it could be really helpful!

Before doing that, check this resource that actually is recommended when scaling :

I will choose this path since reworking the workflows to process less data, and conduct with faster responses like 200 etc…

Btw my appraoach is the following : Webhook processors + queue mode + worklers

Main keptt out of the webhook pool (optionmally with profduction webhoooks disabled on main)

This effectively gives webhook traffic “prioriity” in terms of deddicated capacity for intake.

I don’t know if is the best, but for my case it does th job ell.

Cheers!

We had multiple workers for quite a while, but we had duplicated executions that were caused by that. (see this other discussion)

Regarding webhook processors, this is something we don’t want since we need not only a response but a quick execution of the called process..

It really feels like there’s a blind spot here, an important bug that many should face, but I don’t find lots of other dicussions on the community. This is weird.

Ok, I got your point…

Can you for example share a workflow that you think is taking too long for the response?

Maybe is needed restructuring so you don’t deal with longer waiting time, and maybe use batch if is possible… Etc…

Cheers!

Thank you @Parintele_Damaskin
There are actually multiple workflows queued, all triggered by webhooks, and some are quite short. On average, they all complete in under 1 second once they start. It really seems like an n8n behavior, but I’m not sure how it’s configured this way.

1 Like

Ok… reviewed again the docs, and trying as well to understand from all the perspectives but this it’s more a combination of:

How queue mode + workers behave.

How webhooks are handled.

And how concurrency and load are configured.

In queue modee each execution (including sub-workflows started via Sub-workflow nodes) is processed end‑to‑end by a single worker. Deepp sub-workflow chains are deliberateliy kept on the same worker…ok…if you instead trigger sub-workflows via webhoooks, each beecomes its own execution and can go to different workers…

Webhook proceessors are just another way to scale incoming webbhook traffic in quueue mode they still rely on Redis and EXECUTIONS_MODE=queu e ok…the main trade‑off is sepparation of cocerns and ability to scale receivers vs executors independently… now my brain starts allocatind more resources lol…

Quoting:

Proper queue‑mode setup (main + workers + possibly webhook processors + Redis)

“…ok prepare the math and your cofee…

It means that cconcurrency is still the main limiter iin queue mode, workers pull jobs from Redis and run up to their concurrency limit in parallel. If the number of incoming webhook executions temporarily exceeds your effective concurrency (workers × concurrency per worker, ccapped by N8N_CONCURRENCY_PRODUCTION_LIMIT), executions will accumulate in qqueued until capacity free up, then they re processsed very quickly …

As ressut very low concurrency with many workers can overload the DB, so the right number is infrastructure‑dependent…

Now I am thinking at a workflow level RabbitMQ “queue” as well…

Now my :brain: hit the allowed limit for today .

Cheers!

1 Like