Got annoyed twice over.
First, N8N’s AI Agent just refuses to work with Together.ai (or reallyany non-OpenAI endpoint). Throws “MODEL_NOT_FOUND” with no useful body. Took me a while to find out why: LangChain.js calls /v1/responses by default, but Together — and basically every local backend — only speaks /v1/chat/completions. So every request 404s and the error message is useless.
Second, I’ve got a few llama.cpp / llama-swap boxes running at home and wanted to point N8N at one URL, swap models without editing every workflow, and fall back to a cloud backend when the local ones are busy with something else. LiteLLM exists but felt much heavier than
what I actually needed.
So I wrote a small OpenAI-compatible proxy:
- Aliases like
fast/translator/visionmap to different real
models per backend — swap the model in YAML, every workflow follows - Priority routing with failover: local boxes first, Together as emergency fallback
- /v1/responses ↔ /v1/chat/completions bridge with tool-call round-trip - N8N’s AI Agent works against any chat-completions backend now
- Per-backend api_key so Together / OpenAI / OpenRouter ride along as just another backend
- Hot config reload — edit YAML, no restart
- ~400 lines of Python, FastAPI, MIT
Curious if anyone else has been bumping into this. Happy to help wireit up if you want to try it.

