Workflow randomly timing out in prod but works fine on my machine

zskillhelps · February 28, 2025, 8:56am

A workflow I built driving me crazy. It pulls data from our REST API, does some processing with Function nodes, and dumps everything into PostgreSQL. Works like a charm on my local setup, but in my server? it is random timeout errors that make no sense.

The really frustrating part is how inconsistent it is. Sometimes it’ll run perfectly for 3-4 days straight, then suddenly start failing every single run. Weirdly, if I restart n8n it works again… for a while.

My workflow isn’t that complicated:

Schedule trigger running every 4 hours
HTTP Request grabbing stuff from our API
Function node that cleans up the data
Some Split/Merge operations because our data needs to be broken down
PostgreSQL node that saves everything

The errors I’m getting are super unhelpful:
“Gateway Timeout: Failed to execute operation after 3 retries” or sometimes just “ECONNRESET”

I’ve checked our API logs and the weird thing is, the requests aren’t even hitting the server when it fails. I also checked CPU and memory - all normal.

My setup:

n8n 1.75.0
Running in Docker
PostgreSQL 13
8GB RAM (should be plenty!)
Running on a shared hosting provider

n8n · February 28, 2025, 8:56am

It looks like your topic is missing some important information. Could you provide the following if applicable.

n8n version:
Database (default: SQLite):
n8n EXECUTIONS_PROCESS setting (default: own, main):
Running n8n via (Docker, npm, n8n cloud, desktop app):
Operating system:

Yo_its_prakash · February 28, 2025, 9:02am

have you noticed any pattern with the timing of these failures? Like, do they happen more often at certain times of day? Also, could you share your Docker config? I’m particularly interested in any timeout settings or resource limits you might have.

zskillhelps · February 28, 2025, 9:10am

Nothing exact, but definitely more common from 11-3.

Here’s my Docker setup:

version: '3'
services:
  n8n:
    image: n8nio/n8n:1.75.0
    restart: always
    environment:
      - N8N_BASIC_AUTH_ACTIVE=true
      - N8N_BASIC_AUTH_USER=admin
      - DB_TYPE=postgresdb
      - DB_POSTGRESDB_HOST=postgres-db
      - WEBHOOK_URL=https://n8n.asdfghjkl.com/
    ports:
      - "5678:5678"
    volumes:
      - n8n_data:/home/node/.n8n
    depends_on:
      - postgres-db
    networks:
      - internal

I actually tried adding N8N_DEFAULT_TIMEOUT=60000 yesterday thinking that might help. Still getting random failures.

Yo_its_prakash · February 28, 2025, 9:19am

Could you add a couple of Function nodes - one right before your HTTP Request and one right after - just to log timestamps? Also, what does your HTTP Request node look like? Any specific timeout settings there? Oh, and roughly how much data are we

zskillhelps · February 28, 2025, 9:27am

The HTTP Request is pretty standard - just using all the default timeouts.
Data-wise, we’re pulling around 2000-3000 records per run. Each record is 5KB
one of my workflow is actually creating 10 separate HTTP requests because I’m using pagination to fetch data in chunks (300 records per page).

Yo_its_prakash · February 28, 2025, 9:31am

Does your Docker container have any connection limits configured somewhere?
Have you had a chance to check network usage during these runs?

Yo_its_prakash · February 28, 2025, 9:33am

Try adding these to your Docker config:

- N8N_METRICS=true
- N8N_METRICS_PORT=9229

zskillhelps · February 28, 2025, 10:21am

My hosting provider has a firewall rule limiting 100 concurrent connections per source IP. All my VMs also share the same outbound gateway, which gets congested during peak hours.

During busy hours, my network throughput drops below 1MB/s, but outside business hours, it stays at 10-20MB/s. This makes me think the issue is network congestion from shared infrastructure.

Yo_its_prakash · February 28, 2025, 11:29am

If that is the issue try these solutions:

Add connection pooling to your HTTP Request node:

In “Options” tab, set “Max Connections” to 5
Add “Keep-Alive” header set to “timeout=5, max=5”

Add retry logic with exponential backoff:

In HTTP Request advanced options, set:
“Retry On Fail” = true
“Max Tries” = 5
“Retry Interval” = 3000

Add to your Docker environment:

- N8N_REQUEST_MAX_CONNECTIONS=10
- DB_POSTGRESDB_POOL_MIN=2
- DB_POSTGRESDB_POOL_MAX=10
- N8N_DB_POSTGRESDB_CONNECTION_TIMEOUT=30000

zskillhelps · February 28, 2025, 11:31am

Okay i will try and let you know, how the update goes.

Yo_its_prakash · February 28, 2025, 11:39am

If the solution i provided you solved your issue, please consider marking the comment as solution.

system · March 7, 2025, 11:40am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.