How to use Langchain features with more custom workflows (on cloud)

Giovanni_Segar · September 17, 2024, 7:56pm

I’m adding documents to Supabase for vector embedding and retrieval, and one of the things I want to do is add the ID of the original document (from Google Drive) as metadata.

The built-in vector store node works great, but it doesn’t let you customize the behavior at all. So I can’t add custom columns or modify the metadata.

Ideally, I’d like to be able to use the Text Splitter sub-node to transform data in a regular workflow so I can do my own API call to Supabase to upload the data. But in order to use the Text Splitter I need to connect it to an AI parent node like the vector store.

Is there any way to use the text splitter without relying on the AI nodes n8n has built-in, or am I out of luck?

I’m on n8n cloud so I can’t add langchain directly.

n8n · September 17, 2024, 7:56pm

It looks like your topic is missing some important information. Could you provide the following if applicable.

n8n version:
Database (default: SQLite):
n8n EXECUTIONS_PROCESS setting (default: own, main):
Running n8n via (Docker, npm, n8n cloud, desktop app):
Operating system:

smittym · September 18, 2024, 8:10pm

So, I’ve figured that out. What I’ve done is use a Set Field node after I download the doc and set a field_id field to the field id from the download and then a “source” field from the standard Google Docs URL plus the file id. This gives you the URL to the actual document. Then in your Default Data Loader node, you add those two fields in as part of the metadata. This works great and allows you to retrieve the documents based on the file id. My hope was to use this source data to give the chat user a link to the source document, but I can’t get use the metadata in the response from the Vector Store Tool that my AI Agent is using. Seems the only way to do that is with a LangChain Code Node which of course we don’t have in Cloud.

Giovanni_Segar · September 18, 2024, 8:23pm

@smittym Do you think you could provide an example here? I’m unclear on how to add fields to the metadata using these nodes. I’d like to do that.

Thankfully retrieval is a lot easier and doesn’t require Langchain, I’ve built out several flows allowing me to run a query on Supabase and get back relevant results as well as the metadata or other columns I need.

I don’t like the narrow approach that has been provided with these built-in AI nodes—it makes it easy to create some example projects but not to create anything that deviates from the norm (which I find most projects actually do).

smittym · September 23, 2024, 6:01pm

Sure. I’ll be happy to share. I did end up switching to a Question and Answer Node for my project since I didn’t really need agent functionality. The Question and Answer Chain Node will pass through the source information.

Giovanni_Segar · September 23, 2024, 6:39pm

I like your approach to the subfolder crawling. I was going to do this for a client of mine and ended up just switching to all files at the top level folder, but I wondered if there’d be a way of doing a recursive crawler that can go as many levels as needed.

Thanks for sharing this, I think my n8n version was out of date and didn’t have the metadata options on the data loader yet.

smittym · September 23, 2024, 8:01pm

Yeah. I tried and tried to create a true recursive search and finally gave up. As you’ll note, this only goes three levels deep, but that was enough for my use case. I’m sure it’s possible to do recursive search, but I didn’t want to spend any more time on it as this was a one time upsert of documents into my vector DB.

Giovanni_Segar · September 23, 2024, 8:16pm

Somewhere on the forum there’s a code sample that lets you gather up the data from all previous executions of a loop into one list before proceeding. I’m guessing with some code to manipulate variables it would be possible to create a loop that continually goes through the same nodes until it reaches a dead-end.

But it’s not something I want to spend time figuring out unless I actually have to

I wish there was a way to use code to make an fetch request using built-in creds. If so, it’d be trivial to make a code step that’s capable of this.