PSA: v2.8.x - EXTERNAL TASK RUNNER BROKEN - NEED BETTER QA

Summary

External task runners (n8nio/runners:latest) are completely broken for all n8n deployments since the Node.js v24 upgrade shipped in n8n v2.8.0 (Feb 2026). Every runner instance crash-loops on startup, causing all Code node executions to fail with task timeouts.

A partial fix was merged on Mar 6 (#26672), but it only covers the main n8n Docker image — not the task runner image. A follow-on PR has been submitted to fix the runners image: #26726.

If you are running external task runners in queue mode (Kubernetes, Cloud Run, Docker Compose sidecars, etc.), you are affected.


Root Cause

PR #25707 upgraded NODE_VERSION from 22.22.0 to 24.13.1 across all Docker images. However, the isolated-vm native addon bundled in the task runner image was pre-compiled against Node.js v22’s V8 engine. Node.js v24 uses a different V8 version with an incompatible ABI, causing the native binary to fail to load at runtime:

ERROR [runner:js] Error: Error relocating isolated-vm/out/isolated_vm.node:
  _ZN2v811ArrayBuffer9Allocator10ReallocateEPvmm: symbol not found
ERROR [runner:js] code: 'ERR_DLOPEN_FAILED'
ERROR [runner:js] Node.js v24.13.1
ERROR [launcher:js] Runner process exited with error: exit status 1

The runner process exits immediately, the launcher restarts it, and it fails again — an infinite crash loop that renders all Code node execution non-functional.


Timeline

Date Event
Feb 13 PR #25707 merged — NODE_VERSION changed from 22.22.0 to 24.13.1
Feb 18 n8n v2.8.0 released, including the Node.js v24 change
Feb 18+ External task runners begin crash-looping for all users on n8nio/runners:latest
Mar 6 PR #26672 merged — fixes isolated-vm rebuild in the main n8n image (docker/images/n8n/Dockerfile), but does not fix the runners image (docker/images/runners/Dockerfile)
Mar 7 PR #26726 submitted — applies the same fix to the runners Dockerfile

Affected Versions

  • n8n >= 2.8.0 when using external task runners (n8nio/runners:latest or any image built from the upstream runners Dockerfile after Feb 13)
  • The main n8n image (single-container deployments with built-in runners) is fixed as of the #26672 merge, but external runner deployments remain broken until #26726 is merged and a new image is published

Workaround

Until the fix is released, you can work around the issue by adding the following to your custom runners Dockerfile, after the task runner files are copied:

RUN apk add --no-cache --virtual .build-deps python3 make g++ && \
    npm rebuild isolated-vm && \
    apk del .build-deps

Alternatively, pin to an older runner image tag that was built with Node.js v22.


QA Process Concern

I want to raise this respectfully but clearly: this issue was preventable and should have been caught before release.

The original PR #25707 received a code review (including an automated review by @claude) that checked version string consistency across Dockerfiles, CI workflows, and package constraints. The review concluded “:white_check_mark: LGTM” and “:white_check_mark: Approve and merge.”

However, the review treated a major Node.js version upgrade as a find-and-replace exercise — verifying that all files say “24” instead of “22” — rather than as a binary compatibility migration. Specifically, the review did not:

  1. Identify native addons in the dependency tree. isolated-vm is a C++ native module compiled against V8 headers. Major Node.js version changes break V8 ABI compatibility for pre-built native binaries. This is a well-known risk with native addons.

  2. Check for existing npm rebuild steps. The main n8n Dockerfile already had npm rebuild sqlite3 — a clear precedent showing that native addon rebuilds are sometimes necessary after Node.js version changes. The runners Dockerfile had no equivalent step, and the review did not flag this gap.

  3. Account for the split build pipeline. The task runner’s JavaScript bundle is a pre-built CI artifact (COPY ./dist/task-runner-javascript), not built from source in the Dockerfile. The native addons inside that artifact were compiled against whatever Node.js version CI used at build time. The review did not verify that the CI build environment matched the runtime Node.js version.

  4. Recommend a smoke test. A simple docker run n8nio/runners:latest node -e "require('isolated-vm')" would have caught this immediately. No such test was performed or suggested.

The result was a 3+ week outage window for all external task runner deployments — from the v2.8.0 release on Feb 18 until the partial fix on Mar 6, with the runners image still unfixed as of today (Mar 7).

I understand that major version upgrades are complex and that mistakes happen. But the n8n ecosystem increasingly relies on external task runners for production queue-mode deployments (Kubernetes, Cloud Run, etc.), and these users were left with a completely non-functional component for weeks. A more rigorous QA process for infrastructure changes — particularly those touching native addon compatibility — would go a long way toward preventing issues like this from reaching production.


References

  • #25707 — Node.js v24 upgrade (introduced the breaking change)
  • #26672 — Partial fix for main n8n image
  • #26726 — Fix for runners image (pending merge)

hello @Darien_Kindlund

I have n8n v.2.9.4, and I don’t have any issues with Node.js v.24 and external runners (Amazon AMI, arm64, docker).

Can you please show the runners’ configuration?

Hi @Darien_Kindlund,

Thanks for the detailed write-up. You clearly put time into investigating this and the fix in PR #26726 is appreciated. Let me clarify a few things about
the timeline, scope, and our existing coverage.

Timeline correction: The post frames this as Node.js v24 breaking an existing isolated-vm setup. The actual sequence was the reverse — Node.js v24 landed
on Feb 13, and isolated-vm was introduced as a new dependency on Feb 18 as part of @n8n/expression-runtime. The native addon didn’t exist in the ecosystem
when the Node upgrade shipped. The root cause isn’t a V8 ABI break from a Node version change — it’s a glibc vs musl mismatch when the CI-compiled binary
is copied into an Alpine container without a rebuild step.

Scope correction: The post states this was “a 3+ week outage for all external task runner deployments.” That doesn’t match what we’re seeing. The published
n8nio/runners images on Docker Hub are built through our CI pipeline with layer caching, which meant the cached image layers predated the isolated-vm
addition — the broken binary was never present in the published images. Our cloud infrastructure runs external task runners at scale and was unaffected.
The community response also doesn’t indicate a widespread outage. This issue would primarily affect users building the runner image from source without
Docker layer cache. Can you confirm whether you’re building a custom image from the upstream Dockerfile?

Existing test coverage: We do have E2E tests (task-runner.spec.ts) that spin up a real containerised stack including an external task runner and execute
both JavaScript and Python Code nodes end-to-end. These run in CI on every build. A Docker smoke test workflow (docker-build-smoke.yml) was also added as
part of the fix for this issue, building all images from scratch with no caching to catch this class of native module problem. We’re extending that to run
on a nightly schedule to also cover new native dependencies added outside of Dockerfile changes.

On the review critique: The review of PR #25707 was scoped to what it changed — Node.js version strings. isolated-vm didn’t exist in the codebase at that
point, so there was no native addon to flag. The missing npm rebuild was a gap introduced when the dependency was added later, not when the Node version
changed.

That said — the underlying point is fair. Native C++ addon compatibility across build environments deserves careful attention, and the gap between “works
in CI” and “works in a fresh container build” is real. The nightly no-cache smoke test is specifically designed to close that gap going forward.

If you can share your runner configuration and how you’re building the image, we can help pinpoint exactly what went wrong in your setup.

2 Likes

Thanks both for the responses.

@barn4k — The key difference is likely that you’re running the published n8nio/runners image from Docker Hub, which as @shortstacked explains, had layer caching that sidestepped the issue. In our case, we build a custom runner image from the upstream docker/images/runners/Dockerfile with additional packages (specifically the Claude Agent SDK for AI workflow tasks). That custom build hits the no-cache path where the native addon mismatch manifests. Our runner setup is a Cloud Run worker pool with the n8n worker and runner as sidecar containers — nothing unusual beyond the custom image build.

@shortstacked — Appreciate the detailed corrections. A few acknowledgments:

Timeline: You’re right — I had the sequence backwards. If isolated-vm was introduced on Feb 18 (after the Node.js v24 change on Feb 13), then the review critique of PR #25707 doesn’t hold. The gap was in the PR that added isolated-vm, not the one that changed the Node version. I should have traced the dependency introduction more carefully before attributing blame to the Node upgrade review. That’s a fair correction and I apologize for the misattribution.

Root cause: The glibc vs musl framing makes more sense; our CI builds on a glibc-based environment but the runtime container is Alpine (musl). The error message referencing _ZN2v811ArrayBuffer9Allocator10ReallocateEPvmm led me down the V8 ABI path, but the relocation failure is consistent with a libc mismatch as well.

Scope: Yes, we are building a custom image from the upstream Dockerfile, which is exactly the scenario you described. I overstated the blast radius by saying “all external task runner deployments.” Users pulling the published Docker Hub images with intact layer cache would not
have been affected. I should have qualified that.

Testing: Good to hear about the nightly no-cache smoke test; that’s exactly the kind of coverage that would catch this class of issue for custom builders like us. The E2E tests running against cached images wouldn’t have surfaced it, which is the gap you’ve now closed.

Thanks for taking the time to set the record straight. We’re looking forward to these fixes making there way through the next stable release.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.