Unable to execute workflow on random in queue mode

Hm, I am afraid I don’t know exactly why the dependency resolution here might fail. @netroy could you kindly take a look as to what might be failing here?

Hey @MutedJam , @netroy any update on the above issue?
Were you able to replicate it? I could really use some help here.

Thanks

1 Like

Hey @sulabhsuneja, Thanks for reporting this.

The error screenshot you posted suggests that your package-lock.json file is not the same as the one from the n8n repo.
Is it possible that you ran npm install locally with an older version of npm, and then that file is getting picked up by the docker build? I don’t see any other reason why the build would complain that package-lock.json was created with an old version of npm

Are you using the n8n repo directly, or do you have a fork? if you are using a fork, please make sure that you are using an up-to-date package-lock.json.

Hey @netroy . Thanks for the update.
I am using node 16.17 & npm 8.1.2. Locally, npm install is working fine.

I am using a fork of n8n repo and package-lock.json is already updated and it is still failing.

Thanks

Hey, is the fork public? is there a branch that I could look at?

Hey @netroy
The fork is not public as we have made some customizations according to the requirement of project but thanks for the help. I was able to figure out that it happened due to latest node 16 minor version and npm version which has strict peer dependency check.
Please check this comment npm ERR while resolving [email protected] · Issue #7095 · vuejs/vue-cli · GitHub

So I created a custom base image with node version and rebuilt docker image for n8n and it worked.

Thanks

Hey @netroy @MutedJam
thanks for the last help. I am stuck on another queue issue.
It would be really great if anyone can please take a look into it.

return new Error('Missing lock for job ' + jobId + ' ' + command);

2022-09-26T17:33:13.953+05:30	^

2022-09-26T17:33:13.953+05:30	Error: Missing lock for job 898 failed

2022-09-26T17:33:13.953+05:30	at Object.finishedErrors (/home/node/node_modules/bull/lib/scripts.js:189:16)

2022-09-26T17:33:13.953+05:30	at Job.moveToFailed (/home/node/node_modules/bull/lib/job.js:342:19)

2022-09-26T17:33:13.953+05:30	at processTicksAndRejections (node:internal/process/task_queues:96:5)

I have googled that it is related to the queue but haven’t been able to find a solution for the same.

Thanks

Hey @sulabhsuneja,

What are the resources looking like for the Redis server?

Hey @Jon

I am using AWS managed Redis clusters and metrics are showing just fine.(utilization below 10 percent)

Thanks

Hey @sulabhsuneja,

That was the only thing I could find on the Bull side, Is this an error you are always seeing or is it intermittent?

@Jon . thanks for the quick response.
This is the only error I am getting right now. I have enabled logs and all the nodes are working perfectly and finishing successfully but workflow’s executions are just not getting updated I guess. I’ll also take a look into database if anything is logged there.

Meanwhile, I also found some links on the same issue.

Please check if you can figure something out from it.

Thanks

Hey @sulabhsuneja,

They look to be unrelated to me, Have you changed anything in the core code for n8n? It would be nice to be able to rule out any changes to the docker image you are using.

I will drop a message to the chap internally that knows about Bull to see if he has any thoughts.

Hey @Jon

I am from Devops team so don’t have much knowledge about custom code. I will ask Developer to look into the custom code and check again.

Hey @sulabhsuneja,

I have had a chat with the chap that knows about Bull, Is that error coming from the main instance or one of the workers?

If it is the main instance we will need to know what trigger nodes you are using so we can see what is going on, If it is one of the workers you might need to reduce the concurrency level for it.

We think what is happening is either the main or the worker instance is getting overloaded so we just need to work out the where.

1 Like

Hey @Jon

This error is coming from the worker instance but it was showing

Problem with execution 83200: job stalled more than maxStalledCount. Aborting

and then on the next execution, it was showing

Error: Missing lock for job 934 finished\n    at Queue.onFailed (/home/node/node_modules/bull/lib/job.js:516:18)\n    at Queue.emit (node:events:526:28)\n    at Queue.emit (node:domain:475:12)\n    at Redis.messageHandler (/home/node/node_modules/bull/lib/queue.js:444:14)\n    at Redis.emit (node:events:526:28)\n    at Redis.emit (node:domain:475:12)\n    at DataHandler.handleSubscriberReply (/home/node/node_modules/ioredis/built/DataHandler.js:80:32)\n    at DataHandler.returnReply (/home/node/node_modules/ioredis/built/DataHandler.js:47:18)\n    at JavascriptRedisParser.returnReply (/home/node/node_modules/ioredis/built/DataHandler.js:21:22)\n    at JavascriptRedisParser.execute (/home/node/node_modules/redis-parser/lib/parser.js:544:14)\n    at Socket.<anonymous> (/home/node/node_modules/ioredis/built/DataHandler.js:25:20)\n    at Socket.emit (node:events:526:28)\n    at Socket.emit (node:domain:475:12)\n    at addChunk (node:internal/streams/readable:315:12)\n    at readableAddChunk (node:internal/streams/readable:289:9)\n    at Socket.Readable.push (node:internal/streams/readable:228:10)\n    at TCP.onStreamRead (node:internal/stream_base_commons:190:23)

Thanks

Hey @sulabhsuneja,

In that case it might be worth tweaking your worker concurrency and lowering it to see if that helps or add more worker nodes to your setup. You can find a bit more about that stalled count message here: Job stalled more than max Stalled Count - #5 by krynble

1 Like

Hey @Jon @netroy @MutedJam

Thanks for the response you provided. I am able to run workflows now and resolve the bull issue.

However, I am executing huge amount of data with the workflows containing 20-30 nodes and after worfklows execution, a query runs which updates the execution_entity table and saves the data due to which, its status doesn’t update and it sets to “Unknown”.

Can you please tell me a workaround to not save nodes data in execution_entity table and save only other fields?

Thanks

Hi @sulabhsuneja, it’s either full data or nothing when it comes to execution data, I am afraid. You can only enable or disable storing execution data using the EXECUTIONS_DATA_SAVE_ON_ERROR, EXECUTIONS_DATA_SAVE_ON_SUCCESS, EXECUTIONS_DATA_SAVE_ON_PROGRESS and EXECUTIONS_DATA_SAVE_MANUAL_EXECUTIONS environment variables documented here.

For your scenario there are two different approaches I could think of:

  • Don’t store the n8n execution data but instead create your own database table outside of the n8n db and containing only the information you need (for example workflow id and execution timestamp) and nothing else. You could implement this using a database node running only once at the end of your workflow.
  • Alternatively, consider using sub-workflows to break down the huge amount of data you’re processing into smaller chunks. So instead of processing thousands of rows at once, process chunks of 100 rows each. Sub-workflows could be started using the Execute Workflow node. This would reduce the amount of execution data for each sub-workflow execution.
1 Like

Thanks for the quick response, @MutedJam .
Second option with Sub-workflows sounds much better since it would help us to know which workflow executions got executed successfully.

1 Like

No worries. Here’s an example of how such a parent/sub-workflow combination could look like: Download image from Url(s) in a Json File - #4 by MutedJam

This approach will also reduce the memory consumption greatly and keep it somewhat predictable (assuming your data structure doesn’t change).

2 Likes