A workflow I built driving me crazy. It pulls data from our REST API, does some processing with Function nodes, and dumps everything into PostgreSQL. Works like a charm on my local setup, but in my server? it is random timeout errors that make no sense.
The really frustrating part is how inconsistent it is. Sometimes it’ll run perfectly for 3-4 days straight, then suddenly start failing every single run. Weirdly, if I restart n8n it works again… for a while.
My workflow isn’t that complicated:
Schedule trigger running every 4 hours
HTTP Request grabbing stuff from our API
Function node that cleans up the data
Some Split/Merge operations because our data needs to be broken down
PostgreSQL node that saves everything
The errors I’m getting are super unhelpful:
“Gateway Timeout: Failed to execute operation after 3 retries” or sometimes just “ECONNRESET”
I’ve checked our API logs and the weird thing is, the requests aren’t even hitting the server when it fails. I also checked CPU and memory - all normal.
have you noticed any pattern with the timing of these failures? Like, do they happen more often at certain times of day? Also, could you share your Docker config? I’m particularly interested in any timeout settings or resource limits you might have.
Could you add a couple of Function nodes - one right before your HTTP Request and one right after - just to log timestamps? Also, what does your HTTP Request node look like? Any specific timeout settings there? Oh, and roughly how much data are we
The HTTP Request is pretty standard - just using all the default timeouts.
Data-wise, we’re pulling around 2000-3000 records per run. Each record is 5KB
one of my workflow is actually creating 10 separate HTTP requests because I’m using pagination to fetch data in chunks (300 records per page).
My hosting provider has a firewall rule limiting 100 concurrent connections per source IP. All my VMs also share the same outbound gateway, which gets congested during peak hours.
During busy hours, my network throughput drops below 1MB/s, but outside business hours, it stays at 10-20MB/s. This makes me think the issue is network congestion from shared infrastructure.