Database connection lost & N8N shutdown after SFTP timeout

Describe the problem/error/question

We are running N8N Business self-hosted in AWS ECS using Postgres on RDS. We have AWS ELB sitting in front of it and have already set N8N_PROXY_HOPS=1. We have had a series of container restarts, approximately every couple of hours. The logs show a SFTP timeout error coming from a failed workflow followed by a series of alternating “Database connection timed out” then “Database connection recovered” messages, finally resulting in “Received SIGTERM. Shutting down…”.

What is the error message (if any)?

Here are some examples from our logs:

n8n-database-timeout-shutdown-example-1.csv

n8n-database-timeout-shutdown-example-2.csv

n8n-database-timeout-shutdown-example-3.csv

Please share your workflow/screenshots/recording

{
“nodes”: [
{
“parameters”: {
“protocol”: “sftp”,
“operation”: “list”,
“path”: “/fresho_orders”,
“options”: {}
},
“type”: “n8n-nodes-base.ftp”,
“typeVersion”: 1,
“position”: [
16,
-192
],
“id”: “195270b1-342f-4ed3-bf77-ac33c226632f”,
“name”: “FTP”,
“credentials”: {}
}
],
“connections”: {
“FTP”: {
“main”: [

]
}
},
“pinData”: {},
“meta”: {
“instanceId”: “ef610acf512397dca651e5b5c1be697e6e28c40b43cf6389616d7126cb300b8b”
}
}

{
“nodes”: [
{
“parameters”: {
“protocol”: “sftp”,
“operation”: “list”,
“path”: “/fresho_exports/product_availability_standard_pricing/”,
“options”: {}
},
“type”: “n8n-nodes-base.ftp”,
“typeVersion”: 1,
“position”: [
-2304,
160
],
“id”: “4d780249-0579-48f8-b218-835ae729739a”,
“name”: “Get Fresho Product List1”,
“credentials”: {}
}
],
“connections”: {
“Get Fresho Product List1”: {
“main”: [

]
}
},
“pinData”: {},
“meta”: {
“instanceId”: “ef610acf512397dca651e5b5c1be697e6e28c40b43cf6389616d7126cb300b8b”
}
}

{
“nodes”: [
{
“parameters”: {
“protocol”: “sftp”,
“path”: “/customer_list/customer_list.csv”,
“options”: {}
},
“type”: “n8n-nodes-base.ftp”,
“typeVersion”: 1,
“position”: [
-1904,
128
],
“id”: “6593ff4e-b0c0-4857-960c-3c2809eaee3a”,
“name”: “Get Suppliers list S3”,
“retryOnFail”: true,
“credentials”: {}
}
],
“connections”: {
“Get Suppliers list S3”: {
“main”: [

]
}
},
“pinData”: {},
“meta”: {
“instanceId”: “ef610acf512397dca651e5b5c1be697e6e28c40b43cf6389616d7126cb300b8b”
}
}

Share the output returned by the last node

#[details=“instance information”]

Debug info

core

  • n8nVersion: 2.4.8

  • platform: docker (self-hosted)

  • nodeJsVersion: 22.21.1

  • nodeEnv: production

  • database: postgres

  • executionMode: regular

  • concurrency: -1

  • license: enterprise (production)

storage

  • success: all

  • error: all

  • progress: true

  • manual: true

  • binaryMode: filesystem

pruning

  • enabled: true

  • maxAge: 336 hours

  • maxCount: 10000 executions

client

  • userAgent: mozilla/5.0 (windows nt 10.0; win64; x64; rv:147.0) gecko/20100101 firefox/147.0

  • isTouchDevice: false

Generated at: 2026-02-17T04:42:00.945Z}
[/details]

Hey @cruggles,
Welcome to the community!!

I checked all three CSV log samples and they show the same pattern: first an external connction issue (example #1/#2 show Timed out while waiting for handshake), then n8n starts logging Database connection timed out / Database connection recovered repeatedly (example #2 even fails to save execution progress because it can’t reach Postgres), and shortly after that n8n logs Received SIGTERM. Shutting down…. That SIGTERM is the key detail, it almost always means ECS is stopping/restarting the task, typically because it’s being marked unhealthy during those DB timeout windows, not because n8n “decided to exit”.

The fix in setups like yours is to review the ALB/ECS health check path + thresholds. If your health check hits a DB-dependent endpoint (like readiness), short RDS/network hiccups will flip it to unhealthy and ECS will recycle the container. Switching the health check to a basic “service is up” endpoint (or relaxing the timeout/unhealthy threshold and adding a grace period) usually stops the restart loop immediately. Aftr that, tune your n8n Postgres connection/pool timeout env vars so the DB connection doesn’t flap as much during brief slowdowns, and if you keep seeing “Offer expired” under load, increase the runner task request timeout so n8n has longer to accept runner work.

let me know if this works out :blush:

1 Like

Thanks @Mayank1024 this is really useful. I am using the /healthz endpoint for my ELB health checks. From the documentation it seems that this is not database-dependent? Is there another health check endpoint I should be using?

I’ll adjust the database connection variables as well to see if that improves the situation.

Thanks,
C.J.

Hey @cruggles,

Im glad it helped.

You’re reading it right, /healthz isn’t DB-dependent, it’s basically “is the n8n process up”. The DB-dependent one is /healthz/readiness (it can flip unhealthy when Postgres drops), so if you’re already on /healthz, you don’t need to change the path.

At this point I’d look at two things in parallel: (1) the ECS task stop reason / ALB target health events at the same timestamps as the DB timeout logs (to confirm what’s triggering the SIGTERM), and (2) tuning the Postgres connection/pool settings like you said, since your logs show the DB connection flapping and even failing to save execution progress. If you paste your current DB env vars (pool size + connection/idle timeouts) and the ECS “stopped reason” event, I can suggest specific values to try or if this works that’s even better.

I’ve set the following environment parameters:

DB_POSTGRESDB_POOL_SIZE: '10',
DB_POSTGRESDB_CONNECTION_TIMEOUT: '60000',
DB_POSTGRESDB_IDLE_CONNECTION_TIMEOUT: '120000',

I’ve also changes the ELB health check to interval 10 seconds, timeout 9 seconds, and unhealthy threshold amount to count of 10.

n8n has been running with no problems for about 5 hours now, so hopefully the problem is solved. Will continue to monitor.

Once again, thank you for your help :tada:

1 Like

Awesome, that’s a great sign, those DB pool/timeout settings + the relaxed health check thresholds are exactly the kind of changes that stop the restart loop. If it stays stable over the next 12–24 hours, you’re probably in the clear.

If this resolved it, could you mark the reply as Solution? It’ll help others who hit the same error find the fix quickly.

Thank you!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.