[Queue Mode] Workflows stuck in “Starting soon” + Error after restart — Redis cleaned, still persists

Hi n8n team,

We are running n8n in queue mode with scaling and Redis, and recently started encountering a large number of executions stuck in the “Error” + “Starting soon” state.

Here are our setup details and what we’ve done so far:

:white_check_mark: Setup Info:

  • **n8n version:**1.85.4
  • PostgreSQL version: 16
  • Redis: 7
  • Deployment: Docker Compose
  • Executions Mode: queue
  • Workers Enabled: N8N_RUNNERS_ENABLED=true
  • Offload manual execs: OFFLOAD_MANUAL_EXECUTIONS_TO_WORKERS=true

:package: .env / docker-compose (relevant section):

yaml

CopyEdit

environment:

  • EXECUTIONS_MODE=queue

  • N8N_RUNNERS_ENABLED=true

  • OFFLOAD_MANUAL_EXECUTIONS_TO_WORKERS=true

  • QUEUE_BULL_REDIS_HOST=redis

  • QUEUE_BULL_REDIS_PORT=6379

  • GENERIC_TIMEZONE=Europe/Bucharest

:broom: Actions already taken:

  • Flushed Redis completely via FLUSHALL
  • Cleared executions from PostgreSQL with:
    sql
    CopyEdit

DELETE FROM public.execution_entity WHERE status IS NULL OR status NOT IN (‘success’, ‘error’, ‘manual’);

  • Restarted all containers
  • Checked logs for errors, nothing critical apart from:
    javascript
    CopyEdit

Error – Starting soon

  • Timestamp – 29064405:15m

:cross_mark: Current behavior:

  • Workflows fail to run properly and get stuck in “Starting soon” + “Error”
  • Invalid-looking timestamps in execution logs (e.g. 29064405:15m)
  • Recreating workflows does not solve the issue

:folded_hands: What we need:

As you can see some workflows do not even have scheduled enabled but still get this error, they are not even active

Hi,

When did this problem start? Did you upgrade? The timestamp that you refer to is the runtime of the execution in seconds. might it have something to do with daylight time savings? That it got confused. From the description it’s not clear whether you delete everything from the postgresql dB or not (only filtered for some of them). If possible clear everything.

Do you have any /metrics stats for waiting?

Reg,j.

We also have the same problem!!!

I keep deleting such executions but they keep on coming! They start as errors before even beginning to run!

Exactly, I have tried everything and is annyoing because it hides all others runs and it makes it hard to analyse . So the only solution for now is to remove them ?

Hi,

You have some waits in there. can you tell, me the duration of them?

Also, my 2 cents would be that, there is an execution entity which has a valid status but has a waituntil set. it might be n8n tried to bring it live again, but that the dependencies needed to execute it have already been removed. causing an error.

According to me the only way to find out is to delete all execution data, leaving only workflows / credentials and the absolutely necessary. And to bring down the containers before doing that in postgres

reg,
J.

i am facing the same issue after enabling quee mode. any help plz?
the following is my configuration for docker:

services:
traefik:
image: “traefik”
restart: always
command:
- “–api=true”
- “–api.insecure=true”
- “–providers.docker=true”
- “–providers.docker.exposedbydefault=false”
- “–entrypoints.web.address=:80”
- “–entrypoints.web.http.redirections.entryPoint.to=websecure”
- “–entrypoints.web.http.redirections.entrypoint.scheme=https”
- “–entrypoints.websecure.address=:443”
- “–certificatesresolvers.mytlschallenge.acme.tlschallenge=true”
- “–certificatesresolvers.mytlschallenge.acme.email=${SSL_EMAIL}”
- “–certificatesresolvers.mytlschallenge.acme.storage=/letsencrypt/acme.json”
ports:
- “8080:80”
- “2087:443”
volumes:
- traefik_data:/letsencrypt
- /var/run/docker.sock:/var/run/docker.sock:ro

n8n:
image: docker.n8n.io/n8nio/n8n
restart: always
depends_on:
- postgres
- redis
ports:
- “5678:5678”
labels:
- traefik.enable=true
- traefik.http.routers.n8n.rule=Host(${SUBDOMAIN}.${DOMAIN_NAME})
- traefik.http.routers.n8n.tls=true
- traefik.http.routers.n8n.entrypoints=web,websecure
- traefik.http.routers.n8n.tls.certresolver=mytlschallenge
- traefik.http.middlewares.n8n.headers.SSLRedirect=true
- traefik.http.middlewares.n8n.headers.STSSeconds=315360000
- traefik.http.middlewares.n8n.headers.browserXSSFilter=true
- traefik.http.middlewares.n8n.headers.contentTypeNosniff=true
- traefik.http.middlewares.n8n.headers.forceSTSHeader=true
- traefik.http.middlewares.n8n.headers.SSLHost=${DOMAIN_NAME}
- traefik.http.middlewares.n8n.headers.STSIncludeSubdomains=true
- traefik.http.middlewares.n8n.headers.STSPreload=true
- traefik.http.routers.n8n.middlewares=n8n@docker
environment:
- N8N_HOST=${SUBDOMAIN}.${DOMAIN_NAME}
- N8N_PORT=5678
- N8N_PROTOCOL=https
- NODE_ENV=production
- WEBHOOK_URL=https://${SUBDOMAIN}.${DOMAIN_NAME}:2087
- GENERIC_TIMEZONE=Asia/Riyadh
- QUEUE_BULL_REDIS_HOST=redis
- QUEUE_BULL_REDIS_PORT=6379
- QUEUE_BULL_REDIS_DB=1
- EXECUTIONS_MODE=queue
- DB_TYPE=postgresdb
- DB_POSTGRESDB_HOST=postgres
- DB_POSTGRESDB_PORT=5432
- DB_POSTGRESDB_DATABASE=n8n
- DB_POSTGRESDB_USER=n8n
- DB_POSTGRESDB_PASSWORD=n8n
- N8N_RUNNERS_ENABLED=true
- OFFLOAD_MANUAL_EXECUTIONS_TO_WORKERS=true
- N8N_WORKER_EXECUTIONS=true
- N8N_ENFORCE_SETTINGS_FILE_PERMISSIONS=true
volumes:
- n8n_data:/home/node/.n8n

n8n-worker:
image: docker.n8n.io/n8nio/n8n
restart: always
depends_on:
- n8n
- redis
environment:
- NODE_ENV=production
- EXECUTIONS_MODE=queue
- QUEUE_BULL_REDIS_HOST=redis
- QUEUE_BULL_REDIS_PORT=6379
- QUEUE_BULL_REDIS_DB=1
- DB_TYPE=postgresdb
- DB_POSTGRESDB_HOST=postgres
- DB_POSTGRESDB_PORT=5432
- DB_POSTGRESDB_DATABASE=n8n
- DB_POSTGRESDB_USER=n8n
- DB_POSTGRESDB_PASSWORD=n8n
- N8N_HOST=${SUBDOMAIN}.${DOMAIN_NAME}
- N8N_PORT=5678
- N8N_PROTOCOL=https
- WEBHOOK_URL=https://${SUBDOMAIN}.${DOMAIN_NAME}:2087/
- GENERIC_TIMEZONE=Asia/Riyadh
- N8N_RUNNERS_ENABLED=true
- OFFLOAD_MANUAL_EXECUTIONS_TO_WORKERS=true
- N8N_WORKER_EXECUTIONS=true
- N8N_ENFORCE_SETTINGS_FILE_PERMISSIONS=true
volumes:
- n8n_data:/home/node/.n8n

postgres:
image: postgres:14
restart: always
environment:
- POSTGRES_USER=n8n
- POSTGRES_PASSWORD=n8n
- POSTGRES_DB=n8n
volumes:
- postgres_data:/var/lib/postgresql/data
ports:
- “5435:5432”

redis:
image: redis:6
restart: always
ports:
- “6380:6379”

volumes:
traefik_data:
external: true
n8n_data:
external: true
postgres_data:

Same issue here.
But on my instance, it always sets the executions to error as soon as i restart the processes.
So once in the evening all processes will be restarted and then it automatically sets most of the executions to “Error” and sets the finish date to the time where i restart.
Interestigly they still keep all the other data in the postgresDB, so if you look in the failed execution, you still have all the data like it was running just fine and if you change the flag in the postgres DB, that also looks fine once again.
I created a postgres node that changes that automatically back again if it seems like an execution that was set to fail by this and after every update i try it again to deactivate this step, but to this point it still comes up, i just live with it right now haha

same issue

Same issue here.
It persists even when i turn off all executions. Meaning the trigger node still trying to run, while the whole workflow is turned off.
How can i resolve it?

Hey all,
I was able to troubleshoot and fix this issue.
It’s definitely a weird side case, but I have a feeling it’s everyone’s issue.
Make sure you don’t allow redis to be access by the public!