This is expected and usually confuses people at first.
The per-node timings only show how long the node’s code actively ran. They do not include waiting time.
The total execution time is measured from when the execution is created until it fully finishes. That includes waiting on webhooks, external APIs, network latency, retries, Respond to Webhook behavior, and queue or worker scheduling delays.
So in your example, the nodes really only executed for 444 ms, but the workflow itself existed for 25 seconds while waiting.
Also important: the 4 second timeout applies to individual node operations, not the full workflow lifetime. A workflow can live much longer than that without hitting the timeout.
In short, this is waiting time vs execution time, not a bug. Let me know if this helps!
Thank you very much for your detailed explanation. I totally understand it now. I appreciate you taking the time to break that down for me.
To extend the question slightly, is there a good way to identify what is contributing to the total execution time that is not the node execution?
Since in the example I supplied here, there is an almost 25 second difference between the total execution time and the cumulation of the individual nodes code execution measures, my first assumption is that is is some latency issue between the postgres database or redis instance most likely? Would that be a safe assumption or is there was way to more definitively tell what the cause would be for the jump in total execution time?
You’re welcome! To understand the extra 25s, you’ll need to infer it since n8n only shows per-node execution times and overall start/stop timestamps.
Here’s how:
Inspect the execution timeline: Check the start times of each node in the UI. Gaps between nodes with minimal runtime often indicate waiting on external services, webhooks, or queue delays.
Review logs: If self-hosted, enable debug logs (N8N_LOG_LEVEL=debug) to see when external calls or retries occur.
Monitor infrastructure: Correlate execution times with Redis/Postgres metrics. Enabling /metrics (N8N_METRICS=true) helps spot slow queries or queue backlogs.
Most of the delay is likely due to infrastructure or network issues, but confirming requires matching timestamps with logs and metrics. People usually don’t worry about the execution time they just want everything to work as they expect, that’s nice you care about your workflow and how you can make it better! Let me know if this helps
Reviewing the gaps between the start time of each node as screenshotted doesn’t really explain the 25s differential with only a maximum of a 435ms gap exists between the “Get Okta User” and “If user is DEPROVSIONED” node while all other node start times are only separated by 1-2 ms.
It doesn’t add up to the 25s gap by a long shot there.
I will review the metrics endpoint and will research how to use that to obtain perhaps more familiar information along with debug logging. Appreciate your guidance.
One of the reasons why this is important in this scenario is that the external service utilizing this n8n driven webhook has a hard limit set to wait for 3s for a response. Most of the time the workflows completes and is able to respond in ms, but we see there spikes in 25-30s execution time issues and the process fails due to the calling system upper limit.
In my deployment there is currently a known bottlement in the networking layer between the n8n worker nodes and the supporting redis/postgres services which i will be addressing next week so I’m hoping we see this resolved along with that. Would be nice to see the behavior as more of a smoking gun however for my own knowledge.
@spyder That 25s gap is not actually from the node runtimes those are very short. It’s coming from workflow scheduling and queue/hand-off delays between when n8n accepts the webhook and when the worker actually runs the nodes. n8n only shows active code execution time per node, not the idle/wait time in the queue.
Because your caller only waits 3s, the safest pattern is to respond immediately to the webhook (return 200) and let the workflow finish in the background. This avoids hitting the caller’s timeout even when queue delays spike. Like that’s what i found with the official docs and with some googling, let me know if this helps
Normally i would respond just 200 immediately for just a notification post, but this is a GET webhook and n8n does need to gather information and respond with it, so I do need to wait for the node completion.
I will review the infrastructure changes and perhaps consider scaling up workers if needed.