Describe the problem/error/question
Lately we have been experience some longer times to execute any workflow. Some that would take on average <1s are taking total 10s and some that would take around 2~3s are taking as long as 30s ~ 1 min.
We have it running on AWS Fargate Spot with autoscaling with:
- 1 Webserver 2Gb vCPU / 4Gb RAM
- 4~8 Workers 2Gb vCPU / 5Gb RAM (each) with 100 concurrency
- 4 Webhook 512Mb vCPU / 3Gb RAM (each)
Also:
- Database is a Postgres t4g.medium;
- Execution pruning of 500k executions or 14 days;
- Postgres Pool Size = 4;
- Redis (ValKey)
From the monitoring I can barely see any hiccups in the infrastructure. Database is responding well, no over-usage of CPU/RAM in any of the machines, plenty of space in MemCache. Also, nothing relevant on logs.
What I do see is that usually there is a huge gap between total execution time, and the execution time of each step of a workflow, what makes me think there is something odd on workers taking the tasks, but can’t seem to find what/why.
Any advice on how to proper scale or find the origin of the long execution?
Information on your n8n setup
- n8n version: 1.105.3
- Database (default: SQLite): Postgres
- n8n EXECUTIONS_PROCESS setting (default: own, main): queue
- Running n8n via (Docker, npm, n8n cloud, desktop app): Docker
- Operating system: Linux