RecursiveCharacterTextSplitter own implementation in LangChain code node

Describe the problem/error/question

I am trying to define a custom version of RecursiveCharacterTextSplitter node by extending it’s functionality with the use of LangChain code node. No matter what method I override, it seems it is not called. The output is produced as for an extended RecursiveCharacterTextSplitter class.

Please share your workflow

Share the output returned by the last node

Instead of producing chunks of text by the splitter, I would like to format these chunks in the following way : “Instruct: Given a search query, retrieve relevant passages that answer the query.\nQuery:${chunk}”

Information on your n8n setup

  • n8n version: 1.105.2
  • Database (default: SQLite):
  • n8n EXECUTIONS_PROCESS setting (default: own, main):
  • Running n8n via (Docker, npm, n8n cloud, desktop app): Docker
  • Operating system: Windows 11
1 Like

I made it work finally. I am pasting below the working code for custom TextSplitter node implemented with LangChain code node:

const {
    RecursiveCharacterTextSplitter
} = require('@langchain/textsplitters');
const {
    Document
} = require('@langchain/core/documents');
const splitter = new RecursiveCharacterTextSplitter({
    chunkSize: 500,
    chunkOverlap: 100,
    keepSeparator: false,
    separators: ['\n\n', '\n', ' ', '']
});
// A custom splitter object that wraps the base splitter's methods
const customSplitter = {
    ...splitter,
    splitDocuments: async function(documents) {
        console.log('Custom Splitter: splitDocuments called');

        // Use the base splitter to split the documents
        const splitDocs = await splitter.splitDocuments(documents);

        // Map over the split documents to apply both sets of logic
        return splitDocs.map((doc, index) => {
            // Apply the logic from the original splitText
            const modifiedContent = `Instruct: Given a search query, retrieve relevant passages that answer the query.\nQuery:${doc.pageContent}`;

            // Return a new Document object with the combined changes
            return new Document({
                pageContent: modifiedContent,
                metadata: doc.metadata,
            });
        });
    }
};
return customSplitter;

This way, with such a custom TextSplitter, one can properly preformat each chunk of data before sending to Embedding LLM. This helps in increasing accuracy of embeddings generated by e.g. Qwen3, for which it is advised to use this specific “Instruct:…” formatting.

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.