How to use an Agent with Master Data Validation in n8n

Pichayut_Prasertwit · June 15, 2025, 5:02pm

I’m trying to build a flow in n8n where I use an AI agent (like OpenAI) to help generate some structured data. for example, a list of job positions in a department.

But I don’t want the agent to just make stuff up. I already have a master list (could be in Google Sheets or a DB) that has valid data. So ideally, I want the agent to look at that list, compare it with the input (like “HR department” + a short description), and return only the valid positions from the master.

So like

Input: Department = “HR”, Description = “deals with internal people stuff”
Output: Only positions from the HR section of the master list (e.g., HR Manager, Recruiter, etc.)

Not sure what’s the best approach here. Should I just get all the master data and feed it into the prompt? Or maybe use a vector store? Or something else entirely?

Would love to see if anyone’s done something like this. open to ideas or rough direction. Thanks in advance

rbreen · June 15, 2025, 5:06pm

Hi @Pichayut_Prasertwit ,

You should use a vectors store with embedding for this. When using embeddings, the model can return similar items from the vector to use for its output writing. Give me a min and I’ll upload an example here.

rbreen · June 15, 2025, 5:52pm

Here’s an example workflow moving data from google sheets into supabase, then having an agent use the vector store for information.

If this response helped you, please click the heart to show that it is useful
If this response solved your issue, mark it as the solution to help the community

Pichayut_Prasertwit · June 16, 2025, 3:10am

Thank you so much for your answer.

I still have some question.

Since vector stores usually chunk the data, I’m wondering, when retrieving info, will it bring in all the relevant context properly?

Like, what if the info is split across chunks or isn’t stored together nicely — will it still work reliably?

Feels a bit different from pulling clean, structured lists from something like Google Sheets or a database, where you know the whole row or group comes together.

rbreen · June 16, 2025, 12:22pm

Hi @Pichayut_Prasertwit

The workflow is built to store the data in two ways.

It stores each row in the google sheet exactly like it is as a document row. The AI Agent has a tool to query document rows. It’s writing sql, and returning the actual rows that are stored in supabase.
In addition, the google sheet is chunked and embedded in the documents table as well. The AI Agent can refrerence the chunked embedings as additional data to support the data returned in the query.

If this response helped you, please click the heart to show that it is useful
If this response solved your issue, mark it as the solution to help the community

system · June 23, 2025, 12:23pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.