Describe the problem/error/question
I’m running a production RAG chatbot using the standard AI Agent + Vector Store Retriever setup and consistently hitting 16-18s response times.
After profiling, the bottleneck isn’t the LLM or the vector DB — it’s the AI Agent node itself. It makes 2-4 internal LLM calls per query (tool selection, reasoning loops, memory handling) before generating the actual answer. For a simple “retrieve context → answer” flow, most of that work is unnecessary.
The actual useful pipeline (embedding + vector search + single LLM call) takes only 3-5s when I bypass the Agent node and use raw HTTP + Code nodes instead.
My questions for the community:
-
Are you seeing similar latency with the AI Agent node in production RAG setups?
-
Has anyone found a way to make the AI Agent node faster without bypassing it?
-
Is the Retrieval QA Chain node faster than the AI Agent for simple RAG? Anyone benchmarked?
-
For those who ditched the Agent node — what does your pipeline look like?
-
Does the n8n team have plans to add a lightweight “simple RAG mode” without the reasoning loop?
Workflow A — The slow setup (AI Agent, 16-18s):
Standard setup causing the latency (AI Agent approach):
AI Agent → Vector Store Retriever (Supabase) → OpenAI/Mistral LLM
Workflow B — The fast alternative (HTTP pipeline, 3-5s):
Alternative approach I tested (3-5s response time):
Embed query (HTTP) → Vector search via Supabase RPC (HTTP) → Build prompt (Code) → Call LLM (HTTP) → Parse response (Code)
The output returned by the last node
AI Agent approach: correct answers, but 16-18s latency Raw HTTP approach: same answer quality, 3-5s latency
The difference is entirely due to the hidden LLM calls inside the AI Agent node.
Information on my n8n setup
-
n8n version: 2.9.4
-
Database: PostgreSQL via supabase
-
Running n8n via selfhosted instance Business