WAIT node resumes and runs multiple times if there are multiple worker nodes

shriram.balakrishnan · September 8, 2023, 1:24pm

Describe the problem/error/question

We are running n8n in queue mode with two worker nodes.

We use the WAIT node to wait for more than 65 seconds. But when we the time to resume the workflow comes, both the workers are picking the job and are running this. This results in workflow getting executed 2 times.

How can we prevent this double execution?

Note - This does not happen if I run just 1 worker node

What is the error message (if any)?

NA

Please share your workflow

NA

Share the output returned by the last node

NA

Information on your n8n setup

n8n version: 0.236.0
Database (default: SQLite): postgres
n8n EXECUTIONS_PROCESS setting (default: own, main): NA
Running n8n via (Docker, npm, n8n cloud, desktop app): Docker
Operating system: NA

Jon · September 8, 2023, 1:59pm

Hey @shriram.balakrishnan,

Welcome to the community

You are on an older verison of n8n so I would start with an update and see if that helps, Can you also share the configuration you are using for the workers and main instance and we can see if we are able to reproduce this.

shriram.balakrishnan · September 8, 2023, 3:03pm

Hi @Jon ,

By configuration, do you mean the various environment variables that we have set as mentioned here - Configuration methods | n8n Docs ?

Jon · September 8, 2023, 3:19pm

Hey @shriram.balakrishnan,

That and how you have n8n deployed so if you are using compose share the compose file.

shriram.balakrishnan · September 9, 2023, 6:39am

Hi @Jon

Currently I do not have this information.

Can you let me know how I can run multiple worker nodes in my local machine? I will try to reproduce this locally and will be able to share more details.

shriram.balakrishnan · September 11, 2023, 7:01am

Hi @Jon

Can you share if you have any updates on this?

Jon · September 11, 2023, 10:05am

Hey @shriram.balakrishnan,

Are you able to find out how you have n8n deployed at the moment? I suspect someone would know and it would be very handy to have that information so we can try to reproduce this. If you want to try and reproduce this on your own you can follow the queue mode docs we have on our site but to be honest it would be more useful to know the actual details of how you are running it now.

shriram.balakrishnan · September 11, 2023, 12:07pm

Hi @Jon

Please find the information about our n8n setup.

We have a kubernetes setup of n8n where we have used official helm chart v0.136.0 for n8n v0.236.0.

Execution Mode - Queue
Docker Image - n8nio/n8n:0.236.0
Horizontal Pod Autoscaling configuration (HPA) for worker - min 2 & max 4
Redis is running in Standalone mode within that kubernetes cluster

  DEBUG_ENV: dev
  DB_TYPE: postgresdb
  DB_POSTGRESDB_HOST: 
  DB_POSTGRESDB_DATABASE: 
  DB_POSTGRESDB_PORT: 5432
  NODE_FUNCTION_ALLOW_BUILTIN: "*"
  NODE_FUNCTION_ALLOW_EXTERNAL: "*"
  NODE_ENV: development
  N8N_HOST: 
  N8N_LOG_LEVEL: debug
  N8N_LOG_OUTPUT: "console,file"
  N8N_PORT: 5678
  N8N_SKIP_WEBHOOK_DEREGISTRATION_SHUTDOWN: "true"
  OUTSIDE_K8S_CLUSTER: "true"
  QUEUE_HEALTH_CHECK_ACTIVE: true
  QUEUE_WORKER_TIMEOUT: 30
  EXECUTIONS_DATA_MAX_AGE: 336
  EXECUTIONS_DATA_PRUNE: true
  WEBHOOK_URL: 
  N8N_EDITOR_BASE_URL: 
  N8N_METRICS: true

shriram.balakrishnan · September 11, 2023, 12:55pm

Additionally, I also found this answer from 2020 that mentions about “Scaling n8n in Kubernetes”

Yes, that would work fine. But be aware that it will only work correctly for Webhook-Nodes. For every other Trigger-Node that would cause problems.

Does this apply even now @Jon ?

Jon · September 11, 2023, 1:31pm

Hey @shriram.balakrishnan,

That post seems unrelated and is more around webhooks and calling them rather than wait nodes. The settings you gave look like they should be ok, What does the workflow look like in the execution log?

shriram.balakrishnan · September 12, 2023, 7:37am

Hi @Jon ,

I was able to reproduce this with a simpler workflow while having 2 worker nodes.
In this case, Webhook2 is called twice after wait time (from flow with Webhook1 as starting node).

Description of Workflow:

Flow which has webhook trigger “Webhook1” has the does the following

waits for 80 seconds using “WAIT” node
Makes HTTP Get request to “Webhook2”

Attached screnshot for reference

After the wait time is complete, the webhook2 is called twice. Here are the execution logs of them

Execution log of webhook1 call

Execution log of webhook2 call - First time

Execution log of webhook2 call - Second time

Jon · September 12, 2023, 7:41am

Perfect I will see if I can reproduce that now I have a better idea of what you are doing.

shriram.balakrishnan · September 13, 2023, 12:30pm

Hi, can you share if you have any updates on this @Jon ?

Jon · September 13, 2023, 1:16pm

Hey @shriram.balakrishnan,

Not had a chance to set up a test yet, Going to do it this afternoon.

Jon · September 13, 2023, 2:44pm

Hey @shriram.balakrishnan,

I have just given this a go and for me it is only running once, Can you share your helm chart so I can see how you have the workers running? Do you also have multiple webhook workers or is it just the one?

At the moment this looks like it could be a configuration issue but you are also on an older n8n version so it could be worth trying a newer release.

Instead of calling a webhook could you try the execute workflow option and try calling another workflow that way to see if that also results in the triggering happening twice.

shriram.balakrishnan · September 14, 2023, 5:48am

Hi @Jon

Yes, we are currently running 2 instances of main, webhook and worker containers in our Kubernetes cluster. Please let me know if you need any other information.
Also regarding the usage of latest version - Please confirm if this docker image (1.6.1) is the latest stable version to use for production environment.

Jon · September 14, 2023, 8:18am

Hey @shriram.balakrishnan,

Just to check something there… Are you sure you are running 2 main instances of n8n? We only support running one and having 2 can cause issues, I would double check this and make sure you only have the one main instance.

We consider any version tagged as latest to be the version to use for production but normal best practices would apply so make sure you upgrade a test environment first and do any testing you need to do before signing it off for yor change.

As you are moving from pre v1 to v1 you will also need to follow the migration guide as v1 contains a number of breaking changes.

system · September 21, 2023, 8:18am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.