Help Needed: AI Automation in n8n Not Responding Accurately.

I’m really struggling to build an automation in n8n that acts as an AI consulting a helpdesk. I’m using Olama with the models Qwen-2.5 14B and Qwen-3.0 14B, but it’s just not working—responses are inaccurate, and it fails to answer the questions. I’ve tried adding a bank dictionary, vectorizing the data, and using Qdrant, but I’m still stuck. I would really appreciate any kind advice or best practices. I’m honestly feeling a bit lost here, so any help would mean a lot.

Hi @Igor4 Welcome!!
I think (1) prompts, (2) few-shot examples, and (3) sampling settings like temperature, top_p definitely matter, but they usually do not fix the core problem by themselves.

For helpdesk-style RAG, the bigger issue is often retrieval quality and workflow design:
chunking
metadata
grounding
making sure the right context is passed in before the model answers.
In n8n, RAG is really about retrieving the right documents from a vector store first, then using the LLM on top of that.

So yes, I would look at prompt tuning and model parameters too, but I would focus first on the architecture and the quality of the retrieved context.
Read articles in:

for example:

2 Likes

Hi @Igor4 — RAG quality is definitely the bottleneck here. A few tactical things:

  1. Chunking strategy — how are you splitting your documents? Semantic chunks (by sentence/section) usually beat fixed-size. Make sure no chunk loses important context.

  2. Vector quality — make sure your embedding model is well-suited for your domain, and verify your Qdrant is storing actual vector embeddings (not just text). Retrieval relevance directly impacts LLM output.

  3. Grounding the prompt — before asking Qwen to answer, explicitly inject the top-k retrieved docs into the system prompt. Something like: “You are a helpdesk assistant. Answer ONLY using the following context: [retrieved docs]. If context doesn’t help, say so.”

  4. Temperature + top_p — start with temperature=0.3 (less creative, more factual) and top_p=0.8. This helps with accuracy on knowledge-heavy tasks.

Try these in sequence and let us know which one makes the biggest difference. The architecture is sound—usually it’s retrieval quality or prompt framing.

3 Likes