Workflow stops mid-loop with “error” but no message — only processes 30/400 items

I’m running a loop-based workflow that processes ~400 items one-by-one with a delay node (Wait) between each iteration to reduce load.

Problem:

After around 30 items, the workflow stops unexpectedly and marks the execution as “error” - but:

  • There is no error message shown in the logs
  • No failed node is highlighted
  • The last visible items completed successfully
  • My internet is stable

I suspect it might be a memory issue or something related to long executions/ too many iterations (OpenAI and markdown nodes are involved), but I’ve already:

  • Added a Wait node (2 seconds)
  • Batched everything down to 1 item at a time
  • Enabled Continue on Fail for unstable nodes

Questions:

  • Is there a memory/time limit for full executions even when batching properly?
  • Is there a way to catch/see hidden fatal errors that stop the workflow entirely?
  • How can I make sure that it’s something that happens only with me, and when I’m building automations for others It finishes the full system of 400 items?

Any insights or similar experiences would be really appreciated.

Information on your n8n setup

  • n8n version: 1.91.3 (i think so)
  • Database : Google Sheets for now
  • n8n EXECUTIONS_PROCESS setting (default: own, main): own (default single-process execution mode)
  • Running n8n via (Docker, npm, n8n cloud, desktop app): n8n cloud
  • Operating system: Windows

I also suspect this could be memory related, depending on your server setup. This also depends on how big each item is in your list of 400 items are. Maybe an alternative design would be to implement a pub sub / event driven architecture where you push the 400 items into a queue of one message per item. Then you have a subscriber which picks up and processes one message at a time in serial.

Already has been done.

The 400 items are just names from data base. And the loop over items is a 1 by 1 loop. So a single item gets pushed forward.

The problem is I leave to do other things as the automation is running. And when I check executions it doesn’t tell me the error.

But what I did catch one time is it said ‘connection error’ which is impossible because my internet is high end, my computer is high end and no one is on the internal internet. Meaning that there has to be a problem with their servers as I’m sending many packets or something else which I don’t know what.

I’ve already added a pause for smoother runs, it’s being split into batches of 1 and on error it keeps moving.

I need to understand this in order for me to not have this when giving these to other people.

Have you thought about making another workflow to process, and call that workflow inside this one? It may handle better with processing.

Furthermore, you can add better error handling too.

So on my request nodes, u can see

Which will continue the workflow, if u enable on the error node also, it won’t stop the whole flow (but nodes that requires nodes data may further fail etc)


it will continue processing, you can pass though handy details this way to isolate where and why.

I would recommend adding some stop on error nodes in some places you think it may be happening.

Additionally, the errors you do get can be forwarded to another workflow using the workflow settings

and choose the workflow you want to call.

The workflow you call needs to be a Error Trigger

That you can send an alert to etc, with the details, etc.

Hopefully this helps :slight_smile:

1 Like

Yes, I’ve already set up error handling in the nodes from before (like you said - I’m using “continue"). The issue isn’t with that part - it’s that the memory/state is being stuck and I don’t have a clue on how to make sure that items complete the full workflow, reload that memory and restart from zero.
I’m still getting the hang of chaining workflows together, since I’ve mostly been focused on handling everything in a single flow with error management (Maybe that’s the reason I suck - haha) . But honestly, your advice about using multi-process workflows is probably the key to solving this, just need to make sure that it reloads that memory when I do it. But other than that you are totally right about all of this!

Ill make sure that those workflows work and then I’ll clear this question.

Yes also feel free to check your system specs,

I can’t find the actual cloud sizing but it does reference here slightly, so maybe the cloud plan your on isn’t that performant RAM / CPU wise

So am wondering maybe trying a local dev env which you can monitor more easily etc in dockers, this may help find how much RAM / CPU is needed for this execution etc.

I’ve have it running on my own server at home, just been really lazy to set up all the credentials and everything on it, as here its so simple, just google sign in and that’s it. But then if made on server it’s an additional 10 steps. :sweat_smile:

1 Like

Yes, thats correct haha, btw if my reply helped, I would highly appricate marking it as the solution if okay.

Hope everything is going well.

Many thanks,

Samuel