Workflows taking longer to execute

Describe the problem/error/question

Lately we have been experience some longer times to execute any workflow. Some that would take on average <1s are taking total 10s and some that would take around 2~3s are taking as long as 30s ~ 1 min.

We have it running on AWS Fargate Spot with autoscaling with:

  • 1 Webserver 2Gb vCPU / 4Gb RAM
  • 4~8 Workers 2Gb vCPU / 5Gb RAM (each) with 100 concurrency
  • 4 Webhook 512Mb vCPU / 3Gb RAM (each)

Also:

  • Database is a Postgres t4g.medium;
  • Execution pruning of 500k executions or 14 days;
  • Postgres Pool Size = 4;
  • Redis (ValKey)

From the monitoring I can barely see any hiccups in the infrastructure. Database is responding well, no over-usage of CPU/RAM in any of the machines, plenty of space in MemCache. Also, nothing relevant on logs.

What I do see is that usually there is a huge gap between total execution time, and the execution time of each step of a workflow, what makes me think there is something odd on workers taking the tasks, but can’t seem to find what/why.

Any advice on how to proper scale or find the origin of the long execution?

Information on your n8n setup

  • n8n version: 1.105.3
  • Database (default: SQLite): Postgres
  • n8n EXECUTIONS_PROCESS setting (default: own, main): queue
  • Running n8n via (Docker, npm, n8n cloud, desktop app): Docker
  • Operating system: Linux