Long Running (10-20min) Workflow Dying, why? How to debug multi-loop executions that don't complete

Hi Team,

N8n Cloud, on my instance, I have a looping workflow that is running and it’s basically going non-responsive around loop 62, which is a minimum of 2-3 minutes of run time.

Since it’s cloud, I don’t really have logging access and

An example run:
Execution Id: 16418 of “SS Add Rows Reference V4” workflow

Under executions, as a manual execution, the status is unknown, but I can see in workflow that it’s getting hung up on one of two nodes. one that calls an n8n webhook and another that posts http to a URL.

It’s not clear if I’m hitting the API limit of smartsheets with these calls (429 error), but it’s possible. Logs might help or I guess I need to loggly/etc… the responses for tracking to see, that might help too.

I have a wait node of 2 seconds per round to try to mitigate the 300 calls per minute limit, I might have to increase this.

Thanks for any insight on how to debug this.

Best,

I

What is the workflow doing? Because you may be running out of memory. If that is the case, you can either upgrade your plan or split the workflow to use less memory.

One possible problem that you might be facing is hitting the cloud instance maximum execution time. Executions are time limited but it I am not sure how long the timer is.

One way to find out is open your workflows, go to it’s settings and try setting a timeout to 5 hours. n8n will inform you the maximum time a workflow can run when you try this.

You could also try to set “Save execution progress” configuration to Yes, this will allow you to more precisely see where n8n stops, and if by any chance it’s happening always in the same place.

1 Like

Thanks, those were apparently set to defaults (no and default timeout).

I often forget where to find that settings button, as it only appears for a saved workflow…

I’ll try it and let you know as well, thanks!

Also, the max timeout I can set is 5 minutes on cloud right now… Not sure if there is another place I can adjust the global max?

I think between execution time and memory, I’m going to add another abstraction layer, and have another workflow call a workflow to process smaller chunks of the larger picture via webhooks.

While I’m not processing 500k rows, it’s still potentially a couple million key value pairs and some complex transforms over the course of the execution. It’s a lot of history for sure.

Thanks All, I’ll update my progress

So, as you cannot set a timeout higher than 5 minutes it means this is the enforced timeout duration for your workflow.

What is probably happening is that either it’s crashing or timing out and n8n had to kill it.

Breaking your workflow into smaller executions will probably help, but if your main workflow (the one calling the others) does not complete in 5 minutes it’s probably going to stop anyway.

One possibility is to trigger it very frequently to process 1 item at a time, if that’s possible, similar to a queue.

Thanks for the ideas. I’ll think on this. I think an execution queue list with parameters, that is held somewhere and then grabbed by an n8n cron on externally pushed at intervals is probably the ticket.

This workflow is a flexible routine for loading large quantities of data into Smartsheet’s from a database.

Since it’s controlled by parameters anyway, triggering via webhook from a queue driver workflow should work, especially if the master workflow doesn’t wait for a response code (since I log the results elsewhere anyway) from the loading routine. And I could run the workflow in parallel to some degree as well.

1 Like