Hi,
I’m using queue mode in n8n, running main server and worker as a separate pod via docker/k8s.
I have noticed that after some time (few hours or sometimes in a day, it’s not deterministic ), Worker pod (bull) stops consuming new messages, even though jobs do exists in the queue, I have checked in redis. When I re-start my worker pod, it starts consuming those jobs again. There are no errors in the ‘error’ event, but sometimes, I observed below error in my worker after few more hrs, once this error is received, worker starts consuming the messages again:
Error from queue: Error: read ECONNRESET
at TCP.onStreamRead (internal/stream_base_commons.js:209:20) {
errno: -104,
code: 'ECONNRESET',
syscall: 'read'
}
This is how I’m initialising bull, and activating process method, in worker.ts:
Worker.jobQueue = new Bull(jobName, { prefix, redis: redisOptions, enableReadyCheck: false, settings: { maxStalledCount: 30 } });
Worker.jobQueue.process(flags.concurrency, async (job) =>
this.runJob(job),
);
...
async runJob(job: Bull.Job): Promise<IBullJobResponse> {
// some code
return {
success: true,
};
}
Since I’m not receiving any error event when worker stops consuming new messages, it’s hard to debug this, kindly let me know what could possibly trigger this issue and can I fix this.
Important NOTE : we are not using latest n8n’s version. Instead, have pulled the relevant code for queue mode and all related changes. Also, it works fine when I’m using n8n’s queue mode directly via npm locally via following command (./packages/cli/bin/n8n worker), But facing issue when running via docker/kubernetes only, which is how we are running and needs to run n8n mainly.
Versions in use:
- bull : 4.10.2
- ioredis : 5.2.4
- nodeJs: 14.15