Workflow randomly timing out in prod but works fine on my machine

A workflow I built driving me crazy. It pulls data from our REST API, does some processing with Function nodes, and dumps everything into PostgreSQL. Works like a charm on my local setup, but in my server? it is random timeout errors that make no sense.

The really frustrating part is how inconsistent it is. Sometimes it’ll run perfectly for 3-4 days straight, then suddenly start failing every single run. Weirdly, if I restart n8n it works again… for a while.

My workflow isn’t that complicated:

  1. Schedule trigger running every 4 hours
  2. HTTP Request grabbing stuff from our API
  3. Function node that cleans up the data
  4. Some Split/Merge operations because our data needs to be broken down
  5. PostgreSQL node that saves everything

The errors I’m getting are super unhelpful:
“Gateway Timeout: Failed to execute operation after 3 retries” or sometimes just “ECONNRESET”

I’ve checked our API logs and the weird thing is, the requests aren’t even hitting the server when it fails. I also checked CPU and memory - all normal.

My setup:

  • n8n 1.75.0
  • Running in Docker
  • PostgreSQL 13
  • 8GB RAM (should be plenty!)
  • Running on a shared hosting provider

It looks like your topic is missing some important information. Could you provide the following if applicable.

  • n8n version:
  • Database (default: SQLite):
  • n8n EXECUTIONS_PROCESS setting (default: own, main):
  • Running n8n via (Docker, npm, n8n cloud, desktop app):
  • Operating system:

have you noticed any pattern with the timing of these failures? Like, do they happen more often at certain times of day? Also, could you share your Docker config? I’m particularly interested in any timeout settings or resource limits you might have.

1 Like

Nothing exact, but definitely more common from 11-3.

Here’s my Docker setup:

version: '3'
services:
  n8n:
    image: n8nio/n8n:1.75.0
    restart: always
    environment:
      - N8N_BASIC_AUTH_ACTIVE=true
      - N8N_BASIC_AUTH_USER=admin
      - DB_TYPE=postgresdb
      - DB_POSTGRESDB_HOST=postgres-db
      - WEBHOOK_URL=https://n8n.asdfghjkl.com/
    ports:
      - "5678:5678"
    volumes:
      - n8n_data:/home/node/.n8n
    depends_on:
      - postgres-db
    networks:
      - internal

I actually tried adding N8N_DEFAULT_TIMEOUT=60000 yesterday thinking that might help. Still getting random failures.

Could you add a couple of Function nodes - one right before your HTTP Request and one right after - just to log timestamps? Also, what does your HTTP Request node look like? Any specific timeout settings there? Oh, and roughly how much data are we

The HTTP Request is pretty standard - just using all the default timeouts.
Data-wise, we’re pulling around 2000-3000 records per run. Each record is 5KB
one of my workflow is actually creating 10 separate HTTP requests because I’m using pagination to fetch data in chunks (300 records per page).

  1. Does your Docker container have any connection limits configured somewhere?
  2. Have you had a chance to check network usage during these runs?

Try adding these to your Docker config:

- N8N_METRICS=true
- N8N_METRICS_PORT=9229
1 Like

My hosting provider has a firewall rule limiting 100 concurrent connections per source IP. All my VMs also share the same outbound gateway, which gets congested during peak hours.

During busy hours, my network throughput drops below 1MB/s, but outside business hours, it stays at 10-20MB/s. This makes me think the issue is network congestion from shared infrastructure.

If that is the issue try these solutions:

  1. Add connection pooling to your HTTP Request node:
  • In “Options” tab, set “Max Connections” to 5
  • Add “Keep-Alive” header set to “timeout=5, max=5”
  1. Add retry logic with exponential backoff:
  • In HTTP Request advanced options, set:
  • “Retry On Fail” = true
  • “Max Tries” = 5
  • “Retry Interval” = 3000
  1. Add to your Docker environment:
- N8N_REQUEST_MAX_CONNECTIONS=10
- DB_POSTGRESDB_POOL_MIN=2
- DB_POSTGRESDB_POOL_MAX=10
- N8N_DB_POSTGRESDB_CONNECTION_TIMEOUT=30000
2 Likes

Okay i will try and let you know, how the update goes.

If the solution i provided you solved your issue, please consider marking the comment as solution.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.