I did notice the code implementation, and I’m actually using a similar approach myself. However, I’ve observed that this method sometimes cuts off parts of words. For example, a sentence like “hello I am…” ends up as “llo I am…”, which means it’s not preserving the original word properly.
That’s why I am thinking the possibility of directly importing langchain text splitter in Code node.
I have created a code node {} and want to directly embed text into my pinecone vector store with my custom chunking from that node. However, from my understanding i have to use the pinecone vector store n8n node to embed my text into my vector storage, and that vector store node requires a text splitter. But i dont want a text splitter, i already split my text in a custom way for embedding. Can i just directly go from my code node and embed into vector storage without using the standard pinecone node? If so, how do I do that?
I created a custom flow for this. It does not use the n8n pinecone doc loader at all. It’s a custom doc loader that builds custom chunks for me via a python script i’m currently hosting locally. It took a few hours to build and troubleshoot but I think it’s going to be worth it. IMO it would be cool for n8n to support custom text splitter instead of only the native n8n options.
Instead of going custom doc loading, you can use a character text splitter and simply put the size to very large (e.g. 10,000) and then it will not chunk your text.