Azure Open AI chatmodel - intermittent Connection error

Describe the problem/error/question

Azure OpenAI Chat Model Error

I have a self-hosted n8n running and using the Azure OpenAI chat model for my AI agent. But for some of the workflows, the chat model is failing very randomly with a Connection error.

Error:

2026-02-25T19:19:37.955Z | debug | Workflow execution finished with error {“error”:{“level”:“warning”,“tags”:{“reWrapped”:true},“timestamp”:1772047177953,“context”:{},“functionality”:“regular”,“name”:“NodeOperationError”,“node”:{“parameters”:{“aiAgentStarterCallout”:“”,“promptType”:“auto”,“text”:“={{ $json.chatInput }}”,“hasOutputParser”:false,“needsFallback”:false,“options”:{“systemMessage”:“**System Instructions:** \nprompt”}},“type”:“@n8n/n8n-nodes-langchain.agent”,“typeVersion”:3.1,“position”:[560,0],“id”:“id”,“name”:“AI Agent”},“messages”:[“Connection error.”],“message”:“Connection error.”,“stack”:“NodeOperationError: Connection error.\n at /usr/local/lib/node_modules/n8n/node_modules/.pnpm/@n8n+n8n-nodes-langchain@file+packages+@n8n+nodes-langchain_25556d2df217b3ecea67b49c8dcb0d5a/node_modules/@n8n/n8n-nodes-langchain/nodes/agents/Agent/agents/ToolsAgent/V3/helpers/executeBatch.ts:113:11\n at Array.forEach ()\n at executeBatch (/usr/local/lib/node_modules/n8n/node_modules/.pnpm/@n8n+n8n-nodes-langchain@file+packages+@n8n+nodes-langchain_25556d2df217b3ecea67b49c8dcb0d5a/node_modules/@n8n/n8n-nodes-langchain/nodes/agents/Agent/agents/ToolsAgent/V3/helpers/executeBatch.ts:102:15)\n at processTicksAndRejections (node:internal/process/task_queues:105:5)\n at ExecuteContext.toolsAgentExecute (/usr/local/lib/node_modules/n8n/node_modules/.pnpm/@n8n+n8n-nodes-langchain@file+packages+@n8n+nodes-langchain_25556d2df217b3ecea67b49c8dcb0d5a/node_modules/@n8n/n8n-nodes-langchain/nodes/agents/Agent/agents/ToolsAgent/V3/execute.ts:46:66)\n at ExecuteContext.execute (/usr/local/lib/node_modules/n8n/node_modules/.pnpm/@n8n+n8n-nodes-langchain@file+packages+@n8n+nodes-langchain_25556d2df217b3ecea67b49c8dcb0d5a/node_modules/@n8n/n8n-nodes-langchain/nodes/agents/Agent/V3/AgentV3.node.ts:139:10)\n at WorkflowExecute.executeNode (/usr/local/lib/node_modules/n8n/node_modules/.pnpm/n8n-core@file+packages+core_@[email protected]_@opentelemetry+exporter-trace-otlp_4dbefa9881a7c57a9e05a20ce4387c10/node_modules/n8n-core/src/execution-engine/workflow-execute.ts:1039:8)\n at WorkflowExecute.runNode (/usr/local/lib/node_modules/n8n/node_modules/.pnpm/n8n-core@file+packages+core_@[email protected]_@opentelemetry+exporter-trace-otlp_4dbefa9881a7c57a9e05a20ce4387c10/node_modules/n8n-core/src/execution-engine/workflow-execute.ts:1218:11)\n at /usr/local/lib/node_modules/n8n/node_modules/.pnpm/n8n-core@file+packages+core_@[email protected]_@opentelemetry+exporter-trace-otlp_4dbefa9881a7c57a9e05a20ce4387c10/node_modules/n8n-core/src/execution-engine/workflow-execute.ts:1653:27\n at /usr/local/lib/node_modules/n8n/node_modules/.pnpm/n8n-core@file+packages+core_@[email protected]_@opentelemetry+exporter-trace-otlp_4dbefa9881a7c57a9e05a20ce4387c10/node_modules/n8n-core/src/execution-engine/workflow-execute.ts:2296:11”},“workflowId”:“cV3IXtIFJdzLwYCJ”,“file”:“logger-proxy.js”,“function”:“exports.debug”}
    

Additional Data:

{
    “messages”: [
        “System: **System Instructions:**  
  ],
  "estimatedTokens": 782,
  "options": {
    "model": "gpt-5-mini",
    "timeout": 120000,
    "max_retries": 2,
    "configuration": {
      "fetchOptions": {
        "dispatcher": {
          "_events": {},
          "_eventsCount": 0
        }
      }
    },
    "model_kwargs": {}
  }
}
    

What is the error message (if any)?

Please share your workflow

Share the output returned by the last node

Information on your n8n setup

  • n8n version: 2.7.4
  • Database (default: SQLite): Postgres
  • n8n EXECUTIONS_PROCESS setting (default: own, main): default
  • Running n8n via (Docker, npm, n8n cloud, desktop app): Docker
  • Operating system: RHEL
1 Like

Hi @tej !
I would recommend checking your Azure usage metrics to see if you’re hitting burst limits during execution.

Hi,

configured limits for my model (gpt-5-mini)

Rate limit (Tokens per minute)

1,579,000

Rate limit (Requests per minute)

1,579

When I have checked the metrics it never hit the limits

Hi @tej I recommend using Official OpenAI node and connection for better and reliable service. Azure with openAI might sometimes get unstable with high usage.

@Anshul_Namdev

when using the OpenAi chat model node I am facing below error

“data”:{“error”:{“code”:“404”,“message”:“Resource not found”}},"

baseurl provided–> https://resource-name.openai.azure.com/openai

i have tried multiple endpoints but facing same issue.

Is this compatible ?

2 Likes

Hi @tej

since you aren’t hitting rate limits, the issue is mostly a timeout or a dropped network packet between your docker container and azure. looking at your workflow json, you have a Grafana MCP tool connected to that agent. grafana queries can be slow sometimes. if the agent is waiting on that tool to return heavy data, the underlying http connection to azure might just be timing out and dropping while it waits.

i’d try flipping on the Retry On Fail toggle in the AI Agent node. if it’s just a random network blip or a slightly slow tool execution, a quick retry usually forces it right through without breaking the whole workflow.

as for that 404 error, you’re putting https://resource-name.openai.azure.com/openai as your endpoint in the credentials. n8n actually builds the final api path for you . so it’s taking your url and automatically adding another /openai or deployment string to the end of it. that’s exactly why azure is throwing a 404 back at you—the path doesn’t exist.

just strip it back to the absolute base url: https://resource-name.openai.azure.com/.

once you save that clean url in your credentials, it should connect perfectly.

Hi @tej Have you tried using a different model? On that service? Does it gives the same error, although this should not be happening but 404 is a critical one, i recommend switching up to Openrouter if its in production.

@A_A4 @Anshul_Namdev

Thanks for checking on this.

I was not able to replicate the azure node connection error in my local so I am suspecting some issue with my LB configuration related to timeout setting

Will update if I find the root cause

1 Like

Hi, I tried but was not able to find the issue as the config looks correct.

request your help here @Anshul_Namdev @A_A4

my setup

domain –> https LB –> n8n service on k8s

out of 5 requests 2 requests are failing with connection error

Tested azure openAi connection from n8n container and all the requests passed.

Not sure where this is going wrong

these are my env vars

- name: N8N_PROTOCOL

value: https

- name: N8N_PORT

value: “5678”

- name: N8N_EDITOR_BASE_URL

value: https://domain-name

- name: WEBHOOK_URL

value: https://domain-name

- name: N8N_PUSH_BACKEND

value: sse

- name: N8N_HOST

value: domain-name

- name: N8N_SECURE_COOKIE

value: “true”

- name: N8N_PROXY_HOPS

value: “1”

@Anshul_Namdev Yes i have tried different model as well but still the issue persist

FYI, I am now using the openAi chat node with AZOAI gpt-5 model endpoints.

But the Connection error persists.

Hi @tej
Can you confirm the exact endpoint format you’re using, including api-version, and whether the same deployment works consistently outside of n8n under concurrent load?

@A_A4

when I executed the dns lookup command and every 3rd/4th DNS lookup is failing.

the successful DNS lookup output

Resolved IP: 20.***.**.*** | Family: IPv4

~ $ node -e “require(‘dns’).lookup(‘AZOAI’, (err, address, family) => { if (err) { console.error(‘DNS lookup failed:’, err.code, err.message); process.exitCode=1; return; } console.log(‘Resol
ved IP:’, address, ‘| Family: IPv’ + family); });”

DNS lookup failed: EAI_AGAIN getaddrinfo EAI_AGAIN AZOAI

but if I continue to restrict it to IPv4 family no DNS resolution error is found even for n requests.

node -e “require(‘dns’).lookup(‘AZOAI’,{family:4},(e,a,f)=>{if(e) console.error(e.code,e.message); else console.log(a,f)})”

I have set the NODE_OPTIONS as suggested but still the Connection error is observed in n8n workflow.

FYI, I have checked further on the coredns pods and the outbound rules for VPC NACL, but everything seems fine.

Can you please help understand why the NODE_OPTIONS was not working ?
Or any other issue that you foresee ?

Hi @tamy.santos

Endpoint: https://resoure-name.openai.azure.com/
API version: 2025-04-01-preview
Model (Deployment) Name: gpt-5-mini

Yes, the deployment works fine on the local n8n setup.
But when trying from my k8s cluster setup every 3rd or 4th request to AZAOI fails.

ok!
the cleanest solution is to fix DNS behavior at the cluster level, either by stabilizing CoreDNS or configuring the cluster or node resolver to prefer IPv4, rather than trying to solve this inside n8n @tej

@A_A4 @tamy.santos

I am not so familiar with node.js, but does setting the NODE_OPTIONS as env var be consumed by the node.js ?

Because I still feel that some of DNS lookup is still using the ipv6 family in the n8n pod

@tej

this command uses a Kubernetes feature called hostAliases. think of it as giving your n8n pod a local phonebook. instead of asking the cluster’s flaky DNS server where Azure is (which is what’s causing your timeouts), n8n will bypass DNS entirely and use your working IPv4 address to connect directly.

just swap in your actual deployment name and that working IP address, then run this:

kubectl patch deployment <your-deployment-name> -p '{"spec": {"template": {"spec": {"hostAliases": [{"ip": "20.***.**.***","hostnames": ["resoure-name.openai.azure.com"]}]}}}}'

kubernetes will instantly restart the pod with the fix applied.

1 Like