Help needed: Any way to use langchain text splitter in n8n code node

I wish to manually perform chunking with langchain node with input text, and then output an array of chunks.

Currently, I found the n8n’s text splitter node needs to be attached with an ai agent or llm chain.

Is there any way I could use langchain’s recursive text splitter using code node instead. Thanks :pray:

2 Likes

Hi @TAN_YONG_SHENG ,

Here’s a couple ways to do it. One with a code step, and one with an llm step to split the text.

Best,

Robert

hanks, @rbreen!

I did notice the code implementation, and I’m actually using a similar approach myself. However, I’ve observed that this method sometimes cuts off parts of words. For example, a sentence like “hello I am…” ends up as “llo I am…”, which means it’s not preserving the original word properly.

That’s why I am thinking the possibility of directly importing langchain text splitter in Code node.

I might just build an api endpoint to deal with my custom needs: https://medium.com/@rentierdigital/calling-a-python-script-from-n8n-5d2a34c9cd09

Thanks again.

Regards,
Yong Sheng

1 Like

Hi @TAN_YONG_SHENG,

Thats an interesting idea. I also included an openai step instead of the code. It might be better at splitting the text so it makes sense.

Best,

Robert

1 Like

I am running into this limitation right now.

I have created a code node {} and want to directly embed text into my pinecone vector store with my custom chunking from that node. However, from my understanding i have to use the pinecone vector store n8n node to embed my text into my vector storage, and that vector store node requires a text splitter. But i dont want a text splitter, i already split my text in a custom way for embedding. Can i just directly go from my code node and embed into vector storage without using the standard pinecone node? If so, how do I do that?

UPDATE:

I created a custom flow for this. It does not use the n8n pinecone doc loader at all. It’s a custom doc loader that builds custom chunks for me via a python script i’m currently hosting locally. It took a few hours to build and troubleshoot but I think it’s going to be worth it. IMO it would be cool for n8n to support custom text splitter instead of only the native n8n options.

Instead of going custom doc loading, you can use a character text splitter and simply put the size to very large (e.g. 10,000) and then it will not chunk your text.