Describe the problem/error/question
Hi everyone,
I’m building a RAG agent in n8n for an indoor cycling company. The setup works but has reliability issues I can’t solve through prompting alone, and I’d appreciate input from anyone who’s built production-grade agents.
Setup:
-
Webhook → AI Agent (OpenAI Chat Model) → Edit Fields → Respond to Webhook
-
Tools available to the agent:
-
Airtable (Search) — distributor lookup by country,
-
Pinecone — Ride High magazine content (issues 3–27)
-
SearchAPI (Google, site:body-bike.com) — products and company info
-
HTTP Request — currently disabled via prompt
-
What works:
-
Airtable lookups are fast and accurate when country matches
-
Pinecone returns relevant magazine content for thematic questions
The issues:
-
Tool selection is unreliable. Agent sometimes calls 2–3 tools when one would do, or calls the wrong tool entirely (e.g. Google + Airtable for a pure magazine question). Even with explicit prompt rules (“magazine questions → Pinecone only”), it ignores them.
-
Hallucinated source URLs and dates. When citing Ride High issues, it invents months that don’t match the actual issue URL. Example: writes “Issue 18 January 2023” when the real URL is december-2022. I’ve tried hardcoded date lists in the prompt and explicit examples — still fails.
-
Source mismatch. Sometimes a distributor answer (from Airtable) ends with a Ride High magazine URL as the citation. The source link doesn’t match the content of the answer.
-
Hallucinated geographic coverage. When a country has no distributor in Airtable (correct behavior should be “no local distributor, contact HQ”), the agent suggests a nearby distributor instead and invents claims like “covers Benelux including Germany” — which is false.
-
Model behavior differs. gpt-4o-mini ignores most rules. gpt-4o follows more but isn’t perfect. Latency on 4o is 15–20 seconds which feels high.
What I’ve tried:
-
Hard rules at the top of the system prompt with examples of correct vs. incorrect output
-
Forbidding HTTP Request explicitly
-
Hardcoding the latest issue number
-
Reducing prompt length
My questions:
-
Is there a more reliable way to enforce tool selection than prompt rules? (Routing layer before the agent?)
-
How do people handle deterministic post-processing (e.g. fixing dates/URLs) without adding latency?
-
Is a validator agent worth it given the latency cost, or is there a better pattern?
-
Any tips for getting consistent source attribution that matches the actual tool that was called?I feel like I’m in a whack-a-mole cycle — every time I fix one issue, a new one appears. Fix tool selection → dates break. Fix dates → source URLs break. Fix sources → hallucinated coverage breaks. Is this just how LLM agents work, or am I missing a fundamental approach that would stabilize the whole thing
-
I feel like I’m in a whack-a-mole cycle — every time I fix one issue, a new one appears. Fix tool selection → dates break. Fix dates → source URLs break. Fix sources → hallucinated coverage breaks. Is this just how LLM agents work, or am I missing a fundamental approach that would stabilize the whole thing?
Any guidance appreciated — happy to share more details, screenshots, or my system prompt.
Thanks!
What is the error message (if any)?
Please share your workflow
(Select the nodes on your canvas and use the keyboard shortcuts CMD+C/CTRL+C and CMD+V/CTRL+V to copy and paste the workflow.)
Share the output returned by the last node
Information on your n8n setup
- n8n version:
- Database (default: SQLite):
- n8n EXECUTIONS_PROCESS setting (default: own, main):
- Running n8n via (Docker, npm, n8n cloud, desktop app):
- Operating system: