Got tired of "MODEL_NOT_FOUND" when pointing N8N's AI Agent at non-OpenAI backends — so I wrote a small proxy

Got annoyed twice over.

First, N8N’s AI Agent just refuses to work with Together.ai (or reallyany non-OpenAI endpoint). Throws “MODEL_NOT_FOUND” with no useful body. Took me a while to find out why: LangChain.js calls /v1/responses by default, but Together — and basically every local backend — only speaks /v1/chat/completions. So every request 404s and the error message is useless.

Second, I’ve got a few llama.cpp / llama-swap boxes running at home and wanted to point N8N at one URL, swap models without editing every workflow, and fall back to a cloud backend when the local ones are busy with something else. LiteLLM exists but felt much heavier than
what I actually needed.

So I wrote a small OpenAI-compatible proxy:

  • Aliases like fast / translator / vision map to different real
    models per backend — swap the model in YAML, every workflow follows
  • Priority routing with failover: local boxes first, Together as emergency fallback
  • /v1/responses ↔ /v1/chat/completions bridge with tool-call round-trip - N8N’s AI Agent works against any chat-completions backend now
  • Per-backend api_key so Together / OpenAI / OpenRouter ride along as just another backend
  • Hot config reload — edit YAML, no restart
  • ~400 lines of Python, FastAPI, MIT

Curious if anyone else has been bumping into this. Happy to help wireit up if you want to try it.

1 Like

Great!! Quite similar to LiteLLM

I tried LiteLLM, but was not working for me with togther.ai and priority of my multiple LLM endpoints at home. But yes LiteLLM is the standard.

1 Like

Have not tried. LiteLLM so far worked well for me on OpenRouter, GoogleAi Studio, CloudFlare, NIM etc…

Update:

Together.ai is supported on LiteLLM

1 Like

Thanks, works! I have already LiteLLM at the same LXC. Easy Test. But doesn’t resolve my requirument with alias and priorities.
Strict priority routing across backends sharing the same alias. Unlike LiteLLM’s fallbacks (which maps one model name to another model name on failure), the gateway treats priority as a first-class deployment ordering. One alias fast can route to a local llama.cpp box first and a cloud provider as fallback - and that ordering is exactly what runs, every time, no routing-strategy ceremony needed.

But maybe i have to overthink my requirurements :smiley: so that this fit more to LiteLLM.

1 Like

I think it works for me

What I like most for LiteLLM is I can have different fallbacks for different models

1 Like