Unable to execute workflow on random in queue mode

Jon · September 26, 2022, 4:28pm

They look to be unrelated to me, Have you changed anything in the core code for n8n? It would be nice to be able to rule out any changes to the docker image you are using.

I will drop a message to the chap internally that knows about Bull to see if he has any thoughts.

sulabhsuneja · September 26, 2022, 4:54pm

Hey @Jon

I am from Devops team so don’t have much knowledge about custom code. I will ask Developer to look into the custom code and check again.

Jon · September 27, 2022, 7:43am

Hey @sulabhsuneja,

I have had a chat with the chap that knows about Bull, Is that error coming from the main instance or one of the workers?

If it is the main instance we will need to know what trigger nodes you are using so we can see what is going on, If it is one of the workers you might need to reduce the concurrency level for it.

We think what is happening is either the main or the worker instance is getting overloaded so we just need to work out the where.

sulabhsuneja · September 27, 2022, 1:43pm

Hey @Jon

This error is coming from the worker instance but it was showing

Problem with execution 83200: job stalled more than maxStalledCount. Aborting

and then on the next execution, it was showing

Error: Missing lock for job 934 finished\n    at Queue.onFailed (/home/node/node_modules/bull/lib/job.js:516:18)\n    at Queue.emit (node:events:526:28)\n    at Queue.emit (node:domain:475:12)\n    at Redis.messageHandler (/home/node/node_modules/bull/lib/queue.js:444:14)\n    at Redis.emit (node:events:526:28)\n    at Redis.emit (node:domain:475:12)\n    at DataHandler.handleSubscriberReply (/home/node/node_modules/ioredis/built/DataHandler.js:80:32)\n    at DataHandler.returnReply (/home/node/node_modules/ioredis/built/DataHandler.js:47:18)\n    at JavascriptRedisParser.returnReply (/home/node/node_modules/ioredis/built/DataHandler.js:21:22)\n    at JavascriptRedisParser.execute (/home/node/node_modules/redis-parser/lib/parser.js:544:14)\n    at Socket.<anonymous> (/home/node/node_modules/ioredis/built/DataHandler.js:25:20)\n    at Socket.emit (node:events:526:28)\n    at Socket.emit (node:domain:475:12)\n    at addChunk (node:internal/streams/readable:315:12)\n    at readableAddChunk (node:internal/streams/readable:289:9)\n    at Socket.Readable.push (node:internal/streams/readable:228:10)\n    at TCP.onStreamRead (node:internal/stream_base_commons:190:23)

Thanks

Jon · September 27, 2022, 3:55pm

Hey @sulabhsuneja,

In that case it might be worth tweaking your worker concurrency and lowering it to see if that helps or add more worker nodes to your setup. You can find a bit more about that stalled count message here: Job stalled more than max Stalled Count - #5 by krynble

sulabhsuneja · October 17, 2022, 1:18pm

Hey @Jon @netroy @MutedJam

Thanks for the response you provided. I am able to run workflows now and resolve the bull issue.

However, I am executing huge amount of data with the workflows containing 20-30 nodes and after worfklows execution, a query runs which updates the execution_entity table and saves the data due to which, its status doesn’t update and it sets to “Unknown”.

Can you please tell me a workaround to not save nodes data in execution_entity table and save only other fields?

Thanks

MutedJam · October 18, 2022, 9:16am

Hi @sulabhsuneja, it’s either full data or nothing when it comes to execution data, I am afraid. You can only enable or disable storing execution data using the EXECUTIONS_DATA_SAVE_ON_ERROR, EXECUTIONS_DATA_SAVE_ON_SUCCESS, EXECUTIONS_DATA_SAVE_ON_PROGRESS and EXECUTIONS_DATA_SAVE_MANUAL_EXECUTIONS environment variables documented here.

For your scenario there are two different approaches I could think of:

Don’t store the n8n execution data but instead create your own database table outside of the n8n db and containing only the information you need (for example workflow id and execution timestamp) and nothing else. You could implement this using a database node running only once at the end of your workflow.
Alternatively, consider using sub-workflows to break down the huge amount of data you’re processing into smaller chunks. So instead of processing thousands of rows at once, process chunks of 100 rows each. Sub-workflows could be started using the Execute Workflow node. This would reduce the amount of execution data for each sub-workflow execution.

sulabhsuneja · October 18, 2022, 9:19am

Thanks for the quick response, @MutedJam .
Second option with Sub-workflows sounds much better since it would help us to know which workflow executions got executed successfully.

MutedJam · October 18, 2022, 9:21am

No worries. Here’s an example of how such a parent/sub-workflow combination could look like: Download image from Url(s) in a Json File - #4 by MutedJam

This approach will also reduce the memory consumption greatly and keep it somewhat predictable (assuming your data structure doesn’t change).