I am trying to define a custom version of RecursiveCharacterTextSplitter node by extending it’s functionality with the use of LangChain code node. No matter what method I override, it seems it is not called. The output is produced as for an extended RecursiveCharacterTextSplitter class.
Please share your workflow
Share the output returned by the last node
Instead of producing chunks of text by the splitter, I would like to format these chunks in the following way : “Instruct: Given a search query, retrieve relevant passages that answer the query.\nQuery:${chunk}”
I made it work finally. I am pasting below the working code for custom TextSplitter node implemented with LangChain code node:
const {
RecursiveCharacterTextSplitter
} = require('@langchain/textsplitters');
const {
Document
} = require('@langchain/core/documents');
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 500,
chunkOverlap: 100,
keepSeparator: false,
separators: ['\n\n', '\n', ' ', '']
});
// A custom splitter object that wraps the base splitter's methods
const customSplitter = {
...splitter,
splitDocuments: async function(documents) {
console.log('Custom Splitter: splitDocuments called');
// Use the base splitter to split the documents
const splitDocs = await splitter.splitDocuments(documents);
// Map over the split documents to apply both sets of logic
return splitDocs.map((doc, index) => {
// Apply the logic from the original splitText
const modifiedContent = `Instruct: Given a search query, retrieve relevant passages that answer the query.\nQuery:${doc.pageContent}`;
// Return a new Document object with the combined changes
return new Document({
pageContent: modifiedContent,
metadata: doc.metadata,
});
});
}
};
return customSplitter;
This way, with such a custom TextSplitter, one can properly preformat each chunk of data before sending to Embedding LLM. This helps in increasing accuracy of embeddings generated by e.g. Qwen3, for which it is advised to use this specific “Instruct:…” formatting.