Endless workflow execution

Hello community,
In my execution history list, I can see workflow executions that seems to be still working:

Those are finish for a while, but how can I remove them from this hisotry ? when I click on the “STOP” icon, nothing is happening.

Any idea?
Jérémy

Hey @Jeremy_controlc.io!

Welcome to the community :sparkling_heart:

May I know which version of n8n are you using? Also, can you share what nodes you’re using in these workflows?

You can restart your instance to kill all these running executions.

Hi @harshil1712,

I was in the v0.107.0 (from n8n cloud) and I just upgrade to latest stable version (0.123.1) and restart my instance, it removes all the “in progress” executions.

I think I’ve corrected what made it happen, so I guess this won’t come up again.
But, thanks a lot for the tip

Hi Jérémy! When you clicked the ‘stop’ button, did absolutely nothing happen or did you get an error?

Also, could you share what you changed to fix the problem?

Hi — I’m having the same issue with hundreds of executions at the same time since yesterday, but I can’t find a way to get rid of them all at the same time.

I’m running an instance in queue mode

Hey @ricardo!

Welcome to the community :sparkling_heart:

I am sorry you’re facing an issue. Can you share some more details? Are you on the latest version? Which nodes are you using in your workflows? Did you make changes before you started getting the issue?

Hey @ricardo

Welcome to our community forums!

This might happen if you have no workers available. You will have all your executions stuck.

Is it a new issue that you are facing? Can you confirm that your workers are working properly?

Hi, when I clicked the Stop button, nothing was happening (just like on the print screen, the disabled stop square with the load icon).

My workflow used a webhook as a trigger, and I initially set the Response Mode to “On Received” in stead of “Last Node”. Once I did that change, I didn’t get the problem anymore.

I was able to reproduce the problem:
When building my workflow, in click on the run button for a “SplitInBatch” node. Then the problem showed up and I have now a new “endless execution” in my executions.

Hey @Jeremy_controlc.io!

Can you please answer the following questions? This will help us better understand the problem. What is the size of incoming data?
Does the workflow run into an infinite loop?
Can you please share your workflow? Please make sure you are not sharing any sensitive data :slight_smile:

Are you on the latest version?

We are on 0.126.1

Which nodes are you using in your workflows?

We are using the Postrgres node and the HTTP node

Did you make changes before you started getting the issue?

No, this seems to happen after a daily cron job is executed. The cron job gets about 20k rows at most, and makes one http request for every row. After that the memory of the workers shoots up, and it stays up indefinitely.

There after we start seeing ECONNRESET errors coming from the HTTP node to random endpoints even 12 hours after the initial cron job.

Yes, I have two workers online (though kubernetes activates more if the CPU goes over the limit). They are working fine as far as I can tell. They are processing jobs. Even after deleting the workers and activating them again the issue persists

My problem is originates differently, but somewhat similarly:

  1. CRON job executes once a day
  2. Postgres query retrieves ~20000 records
  3. One HTTP request is executed per row (20000 requests)

From then on I have problems for hours.

I see this issue when trying to display the full query, indicating that the node contains 658 KB of data: https://cdn.zappy.app/65981df2a47dcae24aa559a333c14c09.png

Hey @ricardo

According to the provided information, I don’t think it’s a problem in the scaling process itself.

It looks like the data volume is too large and n8n might be having issues dealing with it.

Can you perhaps divide this cron into 2 separate workflows, working with 10k lines at a time, just so we can see if n8n works fine for a smaller dataset?

The main problem I can see is that while working with all the data, n8n is actually accumulating information in memory and you may be running out of RAM or n8n is struggling to continue due to memory limitations.

Do you have any memory monitoring in place to see how it is behaving?

Thanks for following up — we do have monitoring in place. When the job happens, memory definitely shoots up: https://cdn.zappy.app/6202cf6c74f3e48a4a027b83f480747c.png

There you can see the moment when it happened (10:00pm) and when I reset the workers (8:15am)

The weird part for me is that the issue persists after that for hours.

Update: The HTTP nodes that end up failing, fail with this error:

NodeApiError: UNKNOWN ERROR - check the detailed error for more information
    at Object.execute (/usr/local/lib/node_modules/n8n/node_modules/n8n-nodes-base/dist/nodes/HttpRequest.node.js:832:27)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
    at async /usr/local/lib/node_modules/n8n/node_modules/n8n-core/dist/src/WorkflowExecute.js:424:47

We found and fixed a few issues around this. It turns out that those jobs are not actually running anymore, they just did not get correctly removed from an internal “running-list” and get so wrongly displayed as running. That is also the reason why they could not be stopped (as they were already). Hope we found all the places where it did happen.

I will update here once the new version with those fixes got released. That will likey be Saturday or Sunday.

Thanks a lot for the support. I’ll update my instance next week and check if everything runs properly.

Fix got released with [email protected]