Endless workflow execution

Jeremy_controlc.io · July 2, 2021, 6:14am

Hello community,
In my execution history list, I can see workflow executions that seems to be still working:

Those are finish for a while, but how can I remove them from this hisotry ? when I click on the “STOP” icon, nothing is happening.

Any idea?
Jérémy

harshil1712 · July 2, 2021, 7:26am

Hey @Jeremy_controlc.io!

Welcome to the community

May I know which version of n8n are you using? Also, can you share what nodes you’re using in these workflows?

You can restart your instance to kill all these running executions.

Jeremy_controlc.io · July 2, 2021, 7:56am

Hi @harshil1712,

I was in the v0.107.0 (from n8n cloud) and I just upgrade to latest stable version (0.123.1) and restart my instance, it removes all the “in progress” executions.

I think I’ve corrected what made it happen, so I guess this won’t come up again.
But, thanks a lot for the tip

sirdavidoff · July 2, 2021, 8:14am

Hi Jérémy! When you clicked the ‘stop’ button, did absolutely nothing happen or did you get an error?

sirdavidoff · July 2, 2021, 8:15am

Also, could you share what you changed to fix the problem?

ricardo · July 3, 2021, 12:56am

Hi — I’m having the same issue with hundreds of executions at the same time since yesterday, but I can’t find a way to get rid of them all at the same time.

I’m running an instance in queue mode

harshil1712 · July 5, 2021, 9:40am

Hey @ricardo!

Welcome to the community

I am sorry you’re facing an issue. Can you share some more details? Are you on the latest version? Which nodes are you using in your workflows? Did you make changes before you started getting the issue?

krynble · July 5, 2021, 9:57am

Hey @ricardo

Welcome to our community forums!

This might happen if you have no workers available. You will have all your executions stuck.

Is it a new issue that you are facing? Can you confirm that your workers are working properly?

Jeremy_controlc.io · July 5, 2021, 10:30am

Hi, when I clicked the Stop button, nothing was happening (just like on the print screen, the disabled stop square with the load icon).

My workflow used a webhook as a trigger, and I initially set the Response Mode to “On Received” in stead of “Last Node”. Once I did that change, I didn’t get the problem anymore.

Jeremy_controlc.io · July 7, 2021, 10:59am

I was able to reproduce the problem:
When building my workflow, in click on the run button for a “SplitInBatch” node. Then the problem showed up and I have now a new “endless execution” in my executions.

harshil1712 · July 7, 2021, 12:26pm

Hey @Jeremy_controlc.io!

Can you please answer the following questions? This will help us better understand the problem. What is the size of incoming data?
Does the workflow run into an infinite loop?
Can you please share your workflow? Please make sure you are not sharing any sensitive data

ricardo · July 7, 2021, 12:52pm

Are you on the latest version?

We are on 0.126.1

Which nodes are you using in your workflows?

We are using the Postrgres node and the HTTP node

Did you make changes before you started getting the issue?

No, this seems to happen after a daily cron job is executed. The cron job gets about 20k rows at most, and makes one http request for every row. After that the memory of the workers shoots up, and it stays up indefinitely.

There after we start seeing ECONNRESET errors coming from the HTTP node to random endpoints even 12 hours after the initial cron job.

ricardo · July 7, 2021, 12:54pm

Yes, I have two workers online (though kubernetes activates more if the CPU goes over the limit). They are working fine as far as I can tell. They are processing jobs. Even after deleting the workers and activating them again the issue persists

ricardo · July 7, 2021, 1:01pm

My problem is originates differently, but somewhat similarly:

CRON job executes once a day
Postgres query retrieves ~20000 records
One HTTP request is executed per row (20000 requests)

From then on I have problems for hours.

I see this issue when trying to display the full query, indicating that the node contains 658 KB of data: https://cdn.zappy.app/65981df2a47dcae24aa559a333c14c09.png

krynble · July 7, 2021, 1:35pm

Hey @ricardo

According to the provided information, I don’t think it’s a problem in the scaling process itself.

It looks like the data volume is too large and n8n might be having issues dealing with it.

Can you perhaps divide this cron into 2 separate workflows, working with 10k lines at a time, just so we can see if n8n works fine for a smaller dataset?

The main problem I can see is that while working with all the data, n8n is actually accumulating information in memory and you may be running out of RAM or n8n is struggling to continue due to memory limitations.

Do you have any memory monitoring in place to see how it is behaving?

ricardo · July 7, 2021, 1:45pm

Thanks for following up — we do have monitoring in place. When the job happens, memory definitely shoots up: https://cdn.zappy.app/6202cf6c74f3e48a4a027b83f480747c.png

There you can see the moment when it happened (10:00pm) and when I reset the workers (8:15am)

The weird part for me is that the issue persists after that for hours.

ricardo · July 7, 2021, 4:59pm

Update: The HTTP nodes that end up failing, fail with this error:

NodeApiError: UNKNOWN ERROR - check the detailed error for more information
    at Object.execute (/usr/local/lib/node_modules/n8n/node_modules/n8n-nodes-base/dist/nodes/HttpRequest.node.js:832:27)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
    at async /usr/local/lib/node_modules/n8n/node_modules/n8n-core/dist/src/WorkflowExecute.js:424:47

jan · July 9, 2021, 6:52am

We found and fixed a few issues around this. It turns out that those jobs are not actually running anymore, they just did not get correctly removed from an internal “running-list” and get so wrongly displayed as running. That is also the reason why they could not be stopped (as they were already). Hope we found all the places where it did happen.

I will update here once the new version with those fixes got released. That will likey be Saturday or Sunday.

Jeremy_controlc.io · July 9, 2021, 7:11am

Thanks a lot for the support. I’ll update my instance next week and check if everything runs properly.

jan · July 11, 2021, 5:22pm

Fix got released with [email protected]