Connection aborted - error reading from instance, CloudRun vs CloudSQL

rgrzesk · February 12, 2026, 10:37am

Describe the problem/error/question

We have hosted N8N using GCP CloudRun and Connection through CloudSQL. At some point the connection between those two got aborted resulting in “Database not ready” message.
Fortunately I enabled alerting with checking readiness, thus I got the message. It happened already second time. Redeploying CloudRun helped. But surely, it’s not a solution.
CloudRun runs with:

cpu_idle: false
Execution environment: Second Generation
Startup CPU Boost: true
2 CPUS, 4GiB memory

What is the error message (if any)?

“Database not ready”

DEBUG logs when it started happening:

DEFAULT 2026-02-12T09:47:17.368026Z 2026-02-12T09:47:17.367Z | debug | Querying database for waiting executions {“scopes”:[“waiting-executions”],“file”:“wait-tracker.js”,“function”:“getWaitingExecutions”}

INFO 2026-02-12T09:47:29.229227Z [httpRequest.requestMethod: GET] [httpRequest.status: 200] [httpRequest.responseSize: 193 B] [httpRequest.latency: 1 ms] [httpRequest.userAgent: GoogleStackdriverMonitoring-UptimeChecks(``https://cloud.google.com/monitoring``)] ``https://<n8n-url>/healthz/readiness

DEFAULT 2026-02-12T09:47:39.879757Z 2026/02/12 09:47:39 [db-name] connection aborted - error reading from instance: read tcp IP->DB_IP: read: connection reset by peer

DEFAULT 2026-02-12T09:47:39.880998Z 2026-02-12T09:47:39.880Z | error | Connection terminated unexpectedly {“file”:“error-reporter.js”,“function”:“defaultReport”}

DEFAULT 2026-02-12T09:47:39.881175Z 2026-02-12T09:47:39.880Z | error | Connection terminated unexpectedly {“file”:“error-reporter.js”,“function”:“defaultReport”}

DEFAULT 2026-02-12T09:47:46.882458Z 2026-02-12T09:47:46.881Z | warn | Database connection timed out {“file”:“db-connection.js”,“function”:“ping”}

ERROR 2026-02-12T09:47:46.947087Z [httpRequest.requestMethod: GET] [httpRequest.status: 503] [httpRequest.responseSize: 197 B] [httpRequest.latency: 1 ms] [httpRequest.userAgent: GoogleStackdriverMonitoring-UptimeChecks(``https://cloud.google.com/monitoring``)] ``https://<n8n-url>/healthz/readiness

DEFAULT 2026-02-12T09:47:53.883948Z 2026-02-12T09:47:53.883Z | warn | Database connection timed out {“file”:“db-connection.js”,“function”:“ping”}

Share the output returned by the last node

Information on your n8n setup

n8n version: 2.7.3
Database (default: SQLite): CloudSQL
n8n EXECUTIONS_PROCESS setting (default: own, main): default
Running n8n via (Docker, npm, n8n cloud, desktop app): CloudRun
Operating system: GCP

A_A4 · February 12, 2026, 11:08am

Hello @rgrzesk ,

Cloud SQL kills the connection because it was idle, but n8n’s connection pool didn’t realize it.When n8n tried to use that “dead” connection from its pool to check for waiting executions, it crashed.
If you are currently connecting via Private IP (TCP), you will constantly fight network timeouts. The recommended way to connect Cloud Run to Cloud SQL is via the built-in Unix Domain Socket.

This offloads the connection management to a Google-managed sidecar that handles keep-alives and reconnects automatically.
How to switch:

Cloud Run Config: Go to “Edit & Deploy New Revision” → Container, Networking, Security → Integrations (or “Cloud SQL” tab in older UIs).
Add Connection: Select your Cloud SQL instance. This mounts it at /cloudsql/INSTANCE_CONNECTION_NAME.
Update n8n Env Vars:

DB_TYPE: postgresdb (assuming Postgres)
DB_POSTGRESDB_HOST: /cloudsql/YOUR_PROJECT:REGION:INSTANCE_NAME (Do not use the IP address).
DB_POSTGRESDB_USER / PASSWORD: (Keep as is).

This usually eliminates “connection reset” errors entirely because the socket file doesn’t suffer from TCP network timeouts.

Let me know if you must use TCP/IP to work around this issue as well
I am here to help

rgrzesk · February 12, 2026, 11:38am

Thank you for the reply.

I am already using Cloud SQL Connection:

Terraform:

I have never used TCP connection when connecting to CloudRun N8N. So I think that’s not the issue.

As mentioned it’s not the first time it happened, thus I enabled DEBUG logs, but I don’t see anything specific than aborted connection.

A_A4 · February 12, 2026, 11:43am

Since Cloud Run instances often handle lower traffic or single concurrency, the default n8n connection pool is likely too large, leaving multiple connections sitting idle until Cloud SQL silently kills them. To fix this “rotting connection” issue, you should set the environment variable DB_POOL_SIZE=2 (or a maximum of 5). This forces n8n to recycle a smaller number of connections much more actively, ensuring they stay “fresh” and preventing the server from trying to reuse a dead connection that causes the crash.

rgrzesk · February 12, 2026, 11:58am

I guess you are referring to Database environment variables | n8n Docs and DB_POSTGRESDB_POOL_SIZE variable. I keep it default, so it’s already set to 2.
Enforcing explicitly setting to 2 won’t bring any effect I guess.

A_A4 · February 12, 2026, 12:12pm

OK then try creating a “Heartbeat” workflow that forces the database connection to stay alive.
Simply set up a Schedule Trigger to run every minute and connect it to a Postgres node executing a lightweight SELECT 1 query; this constant activity keeps the connection pool busy, preventing Cloud SQL from ever seeing it as “idle”.

would this work for you?

rgrzesk · February 12, 2026, 12:20pm

Unfortunately it’s very tricky solution for me. I am having a setup of a few N8N instances with different isolated users (different projects). To most of them I don’t have access (from the user perspective).

To achieve that I would need something like predefined workflow that is going to be setup while deploying n8n cloudrun instance using Terraform. Is that somehow possible?

A_A4 · February 12, 2026, 1:29pm

you can configure a Liveness Probe directly in the Terraform configuration.
By adding a liveness_probe that points to n8n’s /healthz/readiness endpoint, Google Cloud will automatically ping your instance every few seconds. This endpoint runs a database query effectively acting as the ‘heartbeat’ to keep the connection pool active without any extra setup.

rgrzesk · February 12, 2026, 2:06pm

Isn’t the same effect we currently have using UpTime Check while calling /heathz/readiness endpoint every 5 minutes? As you can see from the logs we were calling it, but the error occurred anyway. How would that be different? And why every few seconds it has to be called? What does happen under the hood if we miss the hit within 5 minutes window?

A_A4 · February 12, 2026, 2:36pm

in the screenshot Google explicitly states that their infrastructure kills idle connections to save resources and recommends a 60-second keepalive to prevent it.so by hitting the DB every 15 seconds, we are doing exactly what they suggest: forcing traffic through the pipe to reset that idle timer before it cuts us off.

Chris_Bradley · February 12, 2026, 9:26pm

github.com/n8n-io/n8n

Upgrading to v2.7.4 Causes All Workflow Editing to FAIL as OFFLINE

opened 11:03PM - 11 Feb 26 UTC

dkindlund

triage:pending status:in-linear

I just upgraded to v2.7.4 and lost all ability to edit ANY workflow, because the… editor is reporting the server as "Offline" even though I can reach the UI just fine. When I look at the browser debug view, I see that my browser is constantly trying to hit /healthz and getting 404 back from the server, which I think is why this "Offline" mechanism is permanently stuck. <img width="227" height="145" alt="Image" src="https://github.com/user-attachments/assets/d0407a59-1993-476c-8ee9-611ccc19a2da" /> This is a recent BREAKING CHANGE, as v2.6.4 did not have this problem. ## Debug info ### core - n8nVersion: 2.7.4 - platform: npm - nodeJsVersion: 22.22.0 - nodeEnv: production - database: postgres - executionMode: regular - concurrency: -1 - license: enterprise (production) ### storage - success: all - error: all - progress: false - manual: true - binaryMode: filesystem ### pruning - enabled: true - maxAge: 168 hours - maxCount: 10000 executions ### client - userAgent: mozilla/5.0 (macintosh; intel mac os x 10_15_7) applewebkit/537.36 (khtml, like gecko) chrome/145.0.0.0 safari/537.36 - isTouchDevice: false Generated at: 2026-02-11T23:00:19.368Z}

maybe this is your issue, anything form 2.7.x onwards

rgrzesk · February 13, 2026, 12:44pm

Interesting: we also see GET /healthz/readiness -> 200 just a couple milliseconds before the DB socket is reset, then n8n logs connection reset by peer and readiness flips to 503.
That suggests readiness polling isn’t a reliable DB keepalive: it may reflect cached/background status and/or it may not touch the same pooled connection that later gets reused and fails.

Also it is mentioned here:

Additionally I found an information that DB checks are done in the background every 2 seconds:

rgrzesk · February 13, 2026, 12:45pm

I already had that issue with 2.6.x some time ago. But will keep an eye on the issue. Thanks!

rgrzesk · February 19, 2026, 7:38am

It happened again, without any reason. Even though the day before there was a fresh instance with new version 2.9.0 deployed. Based on the logs the /readiness endpoint is checked every few seconds. I think it’s enough. Additionally we do the uptime check every 5 minutes, so we have lots of mechanisms that keeps the machine alive.