Describe the problem/error/question
Hi all !
I’m pretty new to this framework and the whole AI-agent-building scenario.
I was trying to setup a chat bot on Slack where users can ask technical info about a project my organization is working on.
The knowledge base is built using a set of 6 Github repositories that are loaded on a schedule trigger.
I understand I’m not using any text splitter but whenever I do I’m not able to complete the whole workflow (too many data ? a timeout is triggered ?)
However the real problem that I’m facing is that the Question/Answer chain, when querying the vector store, doesn’t seem to return any “contextually” relevant pages. It seems to me that it just returns the first X pages it has in the DB and answers based on what it finds there.
I noticed this by changing the order of the repositories that I import and by debugging the model’s answers.
How can I fix this ? Is it because I’m using a Simple Vector Store ?
Also, I’m noticing the model it’s pretty slow in producing the answers. Does this depend on my usage plan for Gemini API ?
Thanks in advance !
Please share your workflow
Information on your n8n setup
- n8n version: 1.93.0
- Running n8n via n8n cloud
- Operating system: Windows 11
1 Like
Yes, i’d 100 percent not use simple vector store for prod, switch over to superbase or maybe qdrant for better performance, next thing would be adding metadata to your data ure storing, u can reference then inside an ai agent easier to target the exact data you want. Hope that helps
Also I can’t say how google gemini is, but I always ure openai frankly, you may see better results, but maybe something to consider also.
Thanks for the answer !
I thought that the whole point of vectorization and embedding was to query the DB in a “smart way” to retrieve the most relevant results given the query.
While here it doesn’t seem to be the case.
So I’m definitely missing something
Hello,
So yeah the whole point of vector embeddings is to bring back the most relevant stuff based on meaning, not just keywords.
I wasn’t using a text splitter properly before. If you feed in big chunks (like whole files), it gets too vague and the search doesn’t really work. Once I started splitting the content into smaller bits, the relevance improved a lot.
Also, adding metadata like the repo name or file path really helps. You can filter better or just guide the AI more when it sees that info.
But honestly, I’d say the main thing is that the Simple Vector Store just isn’t great for proper use. It works for testing but it’s super basic. I switched to Supabase and saw way better results, Qdrant’s good too.
And yeah, I’ve stuck with OpenAI for embeddings and chat, feels a lot more reliable compared to Gemini in my experience.
Hope this helps,
Samuel