Task request timed out after 60 seconds — External runner + Queue mode on Kubernetes (v2.9.4)

Task request timed out after 60 seconds — External runner + Queue mode on Kubernetes (v2.9.4)

n8n version: 2.9.4
Deployment: Self-hosted on Kubernetes
Mode: Queue mode with Redis
Database: PostgreSQL
Runner image: Custom build on top of n8nio/runners:2.9.4


The error

Every Code node execution — both JavaScript and Python — fails with:

Task request timed out after 60 seconds. Your Code node task was not matched to a runner within the timeout period. This indicates that the task runner is currently down, or not ready, or at capacity.

This happens in both production (queue mode via workers) and manual test executions.


Our setup

We have 4 services running:

  • n8n-main — main instance
  • n8n-worker — queue mode worker
  • n8n-main-task-runner — n8nio/runners sidecar for main
  • n8n-worker-runner — n8nio/runners sidecar for worker

n8n-main and n8n-worker both have:

N8N_RUNNERS_ENABLED=true
N8N_RUNNERS_MODE=external
N8N_RUNNERS_AUTH_TOKEN=<shared-secret>
N8N_RUNNERS_BROKER_LISTEN_ADDRESS=0.0.0.0
N8N_NATIVE_PYTHON_RUNNER=true

n8n-worker also has:

N8N_PROCESS=worker

(We are not sure if this is the correct way to start a worker — is command: ["n8n", "worker"] in the K8s spec required instead?)

n8n-worker-runner has:

N8N_RUNNERS_AUTH_TOKEN=<shared-secret>
N8N_RUNNERS_TASK_BROKER_URI=http://n8n-worker.namespace.svc.cluster.local:5679
N8N_RUNNERS_AUTO_SHUTDOWN_TIMEOUT=15

Our custom n8n-task-runners.json (mounted at /etc/n8n-task-runners.json):

{
  "task-runners": [
    {
      "runner-type": "javascript",
      "workdir": "/opt/runners/task-runner-javascript",
      "health-check-server-port": "5681",
      "env-overrides": {
        "NODE_FUNCTION_ALLOW_BUILTIN": "crypto",
        "NODE_FUNCTION_ALLOW_EXTERNAL": "moment,uuid"
      }
    },
    {
      "runner-type": "python",
      "workdir": "/opt/runners/task-runner-python",
      "health-check-server-port": "5682",
      "env-overrides": {
        "PYTHONPATH": "/opt/runners/task-runner-python",
        "N8N_RUNNERS_STDLIB_ALLOW": "json",
        "N8N_RUNNERS_EXTERNAL_ALLOW": "numpy,pandas"
      }
    }
  ]
}

Our custom Dockerfile for the runner image:

FROM n8nio/runners:2.9.4
USER root
RUN cd /opt/runners/task-runner-javascript && pnpm add moment uuid
RUN cd /opt/runners/task-runner-python && uv pip install numpy pandas
COPY n8n-task-runners.json /etc/n8n-task-runners.json
USER runner

What we’ve already tried

  • Verified auth tokens match across all 4 services
  • Verified encryption keys match between main and worker
  • Confirmed N8N_RUNNERS_BROKER_LISTEN_ADDRESS=0.0.0.0 is set on both n8n instances
  • Confirmed the worker-runner’s N8N_RUNNERS_TASK_BROKER_URI points to the worker, not main
  • Checked network connectivity between runner sidecar and n8n broker port 5679

Specific questions

Q1. Our n8n-task-runners.json does not have command or args fields. We believe this may be the primary cause — the launcher connects to the broker but has no process to spawn. Can anyone confirm what the correct command and args values are for n8nio/runners:2.9.4 for both runners?

Q2. Is N8N_PROCESS=worker a valid env var for starting a worker in v2.x? The docs only show command: n8n worker in Docker Compose examples. Does this translate to command: ["n8n", "worker"] in a Kubernetes deployment spec?

Q3. We noticed GitHub issue #25468 about the -I Python flag causing ModuleNotFoundError: No module named 'src'. Does n8nio/runners:2.9.4 still have this issue, or has it been patched?

Q4. Are NODE_FUNCTION_ALLOW_BUILTIN, NODE_FUNCTION_ALLOW_EXTERNAL, and N8N_RUNNERS_STDLIB_ALLOW effective as container-level environment variables on the n8nio/runners container? Or must they always go inside env-overrides in the JSON config? We currently have them in both places and are unsure which takes precedence.

Q5. OFFLOAD_MANUAL_EXECUTIONS_TO_WORKERS=true is set on main. Does this mean the n8n-main-task-runner sidecar is completely unnecessary and can be removed?


Happy to share full container logs from any of the 4 services. Any help greatly appreciated — this has been blocking production for some time.

Thanks :folded_hands:

1 Like

Hi @kanra

you don’t actually need them. the official examples skip command and args entirely, so your json config structure is fine as-is.

N8N_PROCESS=worker doesn’t do anything. n8n ignores it, which means your worker pod is likely spinning up as a second main instance and ignoring the broker. swap it for command: ["worker"] in your k8s pod spec. this is almost certainly the root cause.

not sure if that python -I bug is patched in 2.9.4 . haven’t seen anything confirming it either way.

put those allow-list vars straight into the env-overrides block in the json. the launcher reads them directly from there.

drop it. since you have OFFLOAD_MANUAL_EXECUTIONS_TO_WORKERS=true, main isn’t executing workflows at all. the n8n-main-task-runner sidecar is completely unnecessary.

might be worth fixing the k8s worker command first and seeing if tasks start routing.

2 Likes

Hello, Thanks for the help so far, update and new findings below.
Added the command : worker

However the issue still persists.

Worker logs confirm it is picking up jobs from Redis correctly:

Worker started execution 167864 (job 8610)
Task request timed out
Worker finished execution 167864 (job 8610)

But the runner logs show it has never connected to the broker at all:

INFO [launcher:js] Waiting for task broker to be ready...
INFO [launcher:py] Waiting for task broker to be ready...

And then nothing. No runnerregistered. No connection. It’s been stuck in this state since startup.

So the worker is real and processing jobs, but its broker has zero runners registered, which is why every Code node times out immediately. (correct?)

Is there anything else that could cause the runner to be permanently stuck on “Waiting for task broker to be ready” other than a networking/port issue?

1 Like

Hey @kanra

Is port 5679 exposed in your worker service ?

- port: 5679 
  targetPort: 5679
2 Likes

Yes, i’m positive port 5679 is already exposed in our worker.

Just to make sure we’re looking at the right thing, our runner sidecar is a separate pod (not a sidecar container in the same pod as the worker), and it connects to the worker broker via:

N8N_RUNNERS_TASK_BROKER_URI: http://<redacted>-n8n-worker.<redacted>.svc.cluster.local:5679

The runner logs show it has been stuck on this since startup with no further output:

INFO [launcher:js] Waiting for task broker to be ready...
INFO [launcher:py] Waiting for task broker to be ready...

A few questions to help us narrow this down:

  1. Are you asking about the worker’s Service port, or does the runner pod also need its own Service with ports exposed?
  2. Could the runner being in a separate pod (rather than a sidecar container sharing the same pod) cause this? Should we move it into the same pod as the worker instead?
  3. Is there anything specific in the runner logs we should look for beyond “Waiting for task broker to be ready” to diagnose why it can’t connect?
2 Likes

Can you confirm N8N_RUNNERS_BROKER_LISTEN_ADDRESS=0.0.0.0 is set on the worker pod? By default the broker only listens on localhost, which means even with port 5679 exposed, connections from a separate pod get blocked at the network interface level.

To answer your questions after research:

  1. The runner pod doesn’t need its own Service, it initiates the connection outbound to the worker broker, not the other way around

  2. Separate pod is completely fine and is the standard pattern

  3. Add N8N_RUNNERS_LAUNCHER_LOG_LEVEL=debug on the runner container, once the broker is actually reachable you’ll immediately see broker:runnerregistered followed by runner:taskoffer in the logs

check these docs: Task runners | n8n Docs and Task runner environment variables | n8n Docs

Let me know what happens

3 Likes

Thank you for your response!
Yes, N8N_RUNNERS_BROKER_LISTEN_ADDRESS=0.0.0.0 has been set on the worker pod from the start.

To give full context on where we are, here is everything we have already verified and ruled out:

Ruled out:

  • N8N_RUNNERS_BROKER_LISTEN_ADDRESS=0.0.0.0 on worker :white_check_mark:
  • Port 5679 exposed in worker’s :white_check_mark:
  • Auth tokens identical across worker and worker-runner :white_check_mark:
  • Worker correctly starting as a worker (confirmed via worker logs) :white_check_mark:
  • Encryption keys matching between main and worker :white_check_mark:
  • Broker URI on runner pointing to correct worker service :white_check_mark:
  • Separate pod architecture (confirmed this is fine per your reply) :white_check_mark:

What the logs show:

Worker logs confirm it is picking up jobs from Redis correctly:

Worker started execution 167864 (job 8610)
Task request timed out
Worker finished execution 167864 (job 8610)

Runner logs show it has never connected to the broker since startup:

INFO [launcher:js] Waiting for task broker to be ready...
INFO [launcher:py] Waiting for task broker to be ready...

Nothing after that. No runnerregistered, no error, no connection attempt visible.

What we’ve gone through:

  • Official n8n task runner docs
  • Task runner environment variables docs
  • task-runner-launcher GitHub setup docs
  • GitHub issues #25468, #22798, #23175
  • Multiple community forum threads on the same timeout error

We are now adding N8N_RUNNERS_LAUNCHER_LOG_LEVEL=debug to the runner container as you suggested and will share the output here shortly.

Is there anything else you’d expect us to see in the debug logs given everything above has already been ruled out?

2 Likes

Thanks to both of you! here is a full update with debug logs and current state.

@houda_ben debug logs added

We added N8N_RUNNERS_LAUNCHER_LOG_LEVEL=debug to the runner container. Here are the logs:

2026/03/10 12:42:37 INFO Starting launcher's health check server at port 5680
2026/03/10 12:42:37 INFO [launcher:py] Starting launcher goroutine...
2026/03/10 12:42:37 DEBUG [launcher:py] Changed into working directory: /opt/runners/task-runner-python
2026/03/10 12:42:37 DEBUG [launcher:py] Env vars to pass to runner: [HOME N8N_RUNNERS_AUTO_SHUTDOWN_TIMEOUT PATH N8N_RUNNERS_TASK_BROKER_URI ...]
2026/03/10 12:42:37 INFO [launcher:py] Waiting for task broker to be ready...
2026/03/10 12:42:37 INFO [launcher:js] Starting launcher goroutine...
2026/03/10 12:42:37 DEBUG [launcher:js] Changed into working directory: /opt/runners/task-runner-javascript
2026/03/10 12:42:37 DEBUG [launcher:js] Env vars to pass to runner: [HOME N8N_RUNNERS_AUTO_SHUTDOWN_TIMEOUT PATH N8N_RUNNERS_TASK_BROKER_URI ...]
2026/03/10 12:42:37 INFO [launcher:js] Waiting for task broker to be ready...

Nothing further after that. What would you expect to see next in the logs at this point, and what does the absence of further output indicate to you?


@OMGItsDerek Q1 — command/args: Confirmed these were missing. We’ve now added them and will verify the exact paths inside the container as you suggested.

Q2 — worker command: Our worker was already using args: ["worker"] before this thread. Worker is confirmed healthy and picking up jobs from Redis.

Q4 — env vars: Moved NODE_FUNCTION_ALLOW_BUILTIN and NODE_FUNCTION_ALLOW_EXTERNAL to JSON env-overrides as recommended.


Current confirmed state:

  • Worker healthy, picking up jobs from Redis :white_check_mark:
  • N8N_RUNNERS_BROKER_LISTEN_ADDRESS=0.0.0.0 on worker :white_check_mark:
  • Port 5679 exposed in worker Kubernetes Service :white_check_mark:
  • Auth tokens match :white_check_mark:
  • Runner broker URI: http://<redacted>-worker:5679 :white_check_mark:
  • Runner reads config and changes into working directories successfully :white_check_mark:
  • Runner still stuck on Waiting for task broker to be ready :cross_mark:

Any ideas what could cause the launcher to stop producing output entirely after this point?

1 Like

Hi @kanra

The logs show us what exactly happening, it probably points to a Kubernetes network policy silently dropping the traffic.

Try to add an ingress rule on the worker allowing TCP port 5679 from the runner pods.

something like this:

 ingress:
   - from:
       - podSelector:
           matchLabels:
             app: <your-runner-label>
     ports:
       - protocol: TCP
         port: 5679

If you don’t have a NetworkPolicy yet, create one for the worker that includes this rule. Restart both pods after applying.

:crossed_fingers:

2 Likes

The NetworkPolicy fix worked — runner is now connecting. @houda_ben thank you!

After adding the ingress rule allowing TCP 5679 from the runner pod to the worker pod, the runner successfully connects to the broker. Here are the debug logs confirming the full handshake:

2026/03/17 08:56:14 DEBUG [launcher:js] Task broker is ready
2026/03/17 08:56:15 DEBUG [launcher:js] Connected: ws://n8n-worker:5679/runners/_ws?id=9dd2484b02dfba1b
2026/03/17 08:56:15 DEBUG [launcher:js] <- Received message `broker:inforequest`
2026/03/17 08:56:15 DEBUG [launcher:js] -> Sent message `runner:info`
2026/03/17 08:56:15 DEBUG [launcher:js] <- Received message `broker:runnerregistered`
2026/03/17 08:56:15 DEBUG [launcher:js] -> Sent message `runner:taskoffer` for offer ID `228b7f95e5463e76`
2026/03/17 08:56:15 INFO  [launcher:js] Waiting for launcher's task offer to be accepted...
2026/03/17 09:00:19 DEBUG [launcher:js] <- Received message `broker:taskofferaccept`
2026/03/17 09:00:19 DEBUG [launcher:js] -> Sent message `runner:taskdeferred`
2026/03/17 09:00:19 DEBUG [launcher:js] Task ready for pickup, launching runner...
2026/03/17 09:00:21 DEBUG [runner:js] Health check server listening on 0.0.0.0, port 5681

New problem — runner works a few times then stops permanently.

Code nodes executed successfully exactly 3 times, then failed on every subsequent run. Restarting the runner pod produces the same pattern — a few successes then permanent failure. The test code node was as simple as return [{json: {test: 'JS works'}}].

After the 3rd success there is no further reconnection or re-registration visible in the logs. The launcher does not appear to be sending a new runner:taskoffer after the first few tasks complete.


Current JSON config:

{
  "task-runners": [
    {
      "runner-type": "javascript",
      "workdir": "/opt/runners/task-runner-javascript",
      "command": "/usr/local/bin/node",
      "args": ["/opt/runners/task-runner-javascript/dist/start.js"],
      "health-check-server-port": "5681",
      "allowed-env": ["PATH", "HOME", "NODE_OPTIONS", "N8N_RUNNERS_*", "NODE_*"],
      "env-overrides": {
        "NODE_FUNCTION_ALLOW_BUILTIN": "*",
        "NODE_FUNCTION_ALLOW_EXTERNAL": "*",
        "N8N_RUNNERS_HEALTH_CHECK_SERVER_HOST": "0.0.0.0",
        "N8N_RUNNERS_AUTO_SHUTDOWN_TIMEOUT": "0"
      }
    },
    {
      "runner-type": "python",
      "workdir": "/opt/runners/task-runner-python",
      "command": "/opt/runners/task-runner-python/.venv/bin/python",
      "args": ["-m", "src.main"],
      "health-check-server-port": "5682",
      "allowed-env": ["PATH", "HOME", "PYTHONPATH", "N8N_RUNNERS_*"],
      "env-overrides": {
        "N8N_RUNNERS_STDLIB_ALLOW": "*",
        "N8N_RUNNERS_EXTERNAL_ALLOW": "numpy,pandas",
        "N8N_RUNNERS_HEALTH_CHECK_SERVER_HOST": "0.0.0.0",
        "N8N_RUNNERS_AUTO_SHUTDOWN_TIMEOUT": "0"
      }
    }
  ]
}

Current runner pod env:

N8N_RUNNERS_AUTH_TOKEN: n8n-runner-secret-2024
N8N_RUNNERS_TASK_BROKER_URI: http://n8n-worker:5679
N8N_RUNNERS_LAUNCHER_LOG_LEVEL: debug
N8N_RUNNERS_AUTO_SHUTDOWN_TIMEOUT: 0

Two things we are unsure about:

  1. Does allowed-env support wildcard patterns like N8N_RUNNERS_* and NODE_*? Or must every variable be listed explicitly? If wildcards are not supported, critical variables like N8N_RUNNERS_TASK_BROKER_URI may not be reaching the spawned runner process.

  2. N8N_RUNNERS_AUTO_SHUTDOWN_TIMEOUT: 0 is currently set in both the container env and in env-overrides in the JSON. Is this the correct way to keep the runner alive between tasks, or does the auto-shutdown timeout need to be configured somewhere else entirely?

Any help appreciated — we are very close, the connection is working, just need the runner to stay alive and keep accepting tasks.

2 Likes

@kanra, both questions answered:

1. Wildcards in allowed-env are NOT supported. The official n8n config uses explicit variable names only.

Update your allowed-env to list every variable explicitly, following the official pattern:

"allowed-env": [
  "PATH", "HOME", "NODE_OPTIONS",
  "N8N_RUNNERS_AUTO_SHUTDOWN_TIMEOUT",
  "N8N_RUNNERS_TASK_TIMEOUT",
  "N8N_RUNNERS_MAX_CONCURRENCY",
  "N8N_RUNNERS_TASK_BROKER_URI",
  "N8N_RUNNERS_AUTH_TOKEN",
  "NODE_FUNCTION_ALLOW_BUILTIN",
  "NODE_FUNCTION_ALLOW_EXTERNAL"
]

2. AUTO_SHUTDOWN_TIMEOUT in both places is correct, container env passes it to the launcher, env-overrides ensures it reaches the runner process. That part of your config is fine.

Try to fix the allowed-env list and see what happens !

2 Likes

Updated allowed-env to explicit variable names as suggested — same timeout pattern persists. 3 successes then permanent failure.

More importantly, looking at the debug logs carefully, the launcher logs stop completely after the first task is accepted. There is no log output after this point at all:

09:00:19 DEBUG [launcher:js] <- Received message `broker:taskofferaccept`
09:00:19 DEBUG [launcher:js] -> Sent message `runner:taskdeferred`
09:00:19 DEBUG [launcher:js] Disconnected: ws://vwo-genai-n8n-worker:5679/runners/_ws?id=9dd2484b02dfba1b
09:00:19 DEBUG [launcher:js] Runner's task offer was accepted
09:00:19 DEBUG [launcher:js] Fetched grant token for runner
09:00:19 DEBUG [launcher:js] Task ready for pickup, launching runner...
09:00:19 DEBUG [launcher:js] Command: /usr/local/bin/node
09:00:19 DEBUG [launcher:js] Args: [/opt/runners/task-runner-javascript/dist/start.js]
09:00:19 DEBUG [launcher:js] Started monitoring runner health
09:00:21 DEBUG [runner:js] Health check server listening on 0.0.0.0, port 5681

Nothing after that — no reconnection to the broker, no new runner:taskoffer, no errors, no exit message. The launcher appears to stop producing any output after the runner process spawns.

The 3 successful executions suggest the runner process itself is working, but the launcher is not completing its reconnection loop after tasks finish.

Questions:

  1. After a task completes and the runner process exits, what should the launcher log next? We would expect to see it reconnect to the broker and send a new runner:taskoffer but we see nothing.
  2. Could the runner process be hanging rather than exiting cleanly after task completion, causing the launcher to wait indefinitely?
  3. Is there any known issue with the launcher stopping its reconnection loop after a certain number of tasks in v2.9.4?
1 Like