How can I adapt Data Loader output to match Cohere embedding input format in n8n?

estsaiDev · March 24, 2025, 6:12am

Hi everyone,

I’m currently building a workflow in n8n to ingest documents from an S3 bucket, embed them using the Cohere multilingual embedding model (via AWS Bedrock), and store the resulting vectors in a vector database.

However, I’ve encountered an issue: the output format of the Data Loader node (e.g., when loading documents from S3) does not seem to match the required input format of the Cohere embedding model. Specifically, the Cohere model expects an input structure like:

{
  "texts": ["Document text here"],
  "input_type": "search_document"
}

But the Data Loader node outputs documents in a format like:

{
  "pageContent": "...",
  "metadata": { ... }
}

It seems that n8n doesn’t natively support reshaping the output of the Data Loader node to match Cohere’s expected input format. Has anyone found a clean way to transform the document structure within n8n so it can be used directly with Cohere’s Bedrock embedding endpoint?

Any guidance, tips, or node configurations would be appreciated!

Thanks in advance.

Information on your n8n setup

**n8n version: 1.82.3 **
Database (default: SQLite):
n8n EXECUTIONS_PROCESS setting (default: own, main):
Running n8n via (Docker, npm, n8n cloud, desktop app): docker
Operating system: win11

Miquel_Colomer · March 24, 2025, 8:25am

Hi @estsaiDev,

Integrating the Data Loader node’s output with Cohere’s embedding model in n8n requires a transformation step to align the data formats. Here’s how you can achieve this:

1. Utilize the Code Node for Data Transformation:

The Code node in n8n allows for custom JavaScript code execution, enabling you to reshape the Data Loader’s output to match Cohere’s expected input structure.

Add a Code Node: Place the Code node between the Data Loader node and the Cohere embedding node in your workflow.
Insert Transformation Code: Use the following JavaScript snippet to transform the data:

  return items.map(item => ({
    json: {
      texts: [item.json.pageContent],
      input_type: "search_document"
    }
  }));

This script maps over each item from the Data Loader, extracting the pageContent and structuring it into the format required by Cohere.

2. Configure the Cohere Embedding Node:

After the transformation, ensure that the Cohere embedding node is set to process the reformatted data correctly. Refer to the Embeddings Cohere node documentation for detailed configuration options.

3. Alternative Approach Using AI Transform Node:

If you’re on an n8n Cloud plan, you can leverage the AI Transform node to generate the necessary transformation code automatically:

Add the AI Transform Node: Place it between the Data Loader and Cohere nodes.
Provide Instructions: Input a prompt like, “Reshape Data Loader output to match Cohere embedding input format.”
Generate and Apply Code: The node will produce a code snippet to perform the transformation, which you can use directly or adapt as needed.

For more details on using the AI Transform node, refer to the AI Transform documentation.

4. Additional Resources:

Understanding data transformation in n8n is crucial for tasks like this. The Transforming data guide provides insights into various methods and nodes available for data manipulation within n8n.

By implementing these steps, you should be able to seamlessly integrate the Data Loader’s output with Cohere’s embedding model, facilitating efficient document processing within your workflow.

estsaiDev · March 24, 2025, 8:42am

Hi @Miquel_Colomer,
Thank you, I really appreciate your detailed answer.
In my case, I wasn’t able to find an option to “add a node” between the Data Loader node and the Vector Store node.
It seems that the Vector Store node doesn’t allow inserting an intermediate transformation step, which makes it difficult to reshape the data format as needed.

Miquel_Colomer · March 24, 2025, 8:54am

Could you share your full workflow? This is required to understand the overview of the process.

estsaiDev · March 24, 2025, 9:21am

Miquel_Colomer · March 24, 2025, 9:59am

I understand the challenge you’re facing with inserting an intermediate transformation step between the Data Loader node and the Vector Store node in n8n. The current design of n8n’s AI nodes, such as the Data Loader and Vector Store nodes, doesn’t inherently support inserting standard processing nodes between them. This limitation arises because these AI nodes are structured to work within a specific flow, often requiring direct connections to function as intended.

However, there are alternative approaches you can consider to reshape your data before inserting it into the Vector Store:

Use the Default Data Loader’s Metadata Options: The Default Data Loader node allows you to add metadata to the documents being processed. By configuring the metadata settings within this node, you can include additional information or restructure the data to some extent before it reaches the Vector Store. This approach enables you to enrich your data without needing to insert an intermediate node.
Leverage the Code Node for Custom Transformations: If the transformations you require are complex and cannot be handled within the Default Data Loader’s settings, you might consider using the Code node. While the Code node cannot be directly inserted between the Data Loader and Vector Store nodes, you can design your workflow to process the data separately using the Code node before it enters the Data Loader. This way, you can perform custom transformations on your data, such as reshaping formats or adding specific metadata, before it’s loaded and sent to the Vector Store.

It’s also worth noting that discussions within the n8n community have highlighted similar challenges. For instance, users have explored methods to add custom metadata by utilizing nodes like Set before the Data Loader to include additional fields, which are then recognized as metadata in the Vector Store.

While these workarounds require adjusting the typical workflow structure, they offer viable solutions to transform and enrich your data before it’s stored in the Vector Store.

estsaiDev · March 25, 2025, 6:27am

Thank you very much for your detailed response and helpful suggestions. I’ve experimented with the approaches you mentioned, and while they’ve been insightful, I’ve encountered a few issues I’d like to clarify — just to ensure I’m not misunderstanding how the nodes are intended to function.

Regarding Option 1 (Using Metadata in the Data Loader):
I understand that enriching the data by configuring metadata directly in the Data Loader node is a viable method. However, my concern is:

Once the data reaches the Vector Store, will it still correctly recognize which part is the actual document content and which part is the metadata?
In other words, if I restructure or enhance the data format via the Data Loader, could this potentially cause issues where the Vector Store no longer properly distinguishes between the document text and the metadata?

Regarding Option 2 (Using a Code Node before the Data Loader):
You mentioned that while we can’t place a transformation node between the Data Loader and the Vector Store, we can insert a Code node before the Data Loader to reshape the data.
However, in my current workflow, it seems that:

The Data Loader is always expected to be directly followed by the Vector Store node, and I haven’t been able to insert any other node before that connection without breaking the flow.
Is this an incorrect assumption on my part? Or is there a specific configuration that allows the Data Loader to receive data from an upstream transformation node like Code or Set?

From what I’ve observed so far, it seems that the AI nodes (Data Loader and Vector Store) in n8n are tightly coupled and require a direct connection to function. If there is a workaround that allows for preprocessing before data enters this chain, I’d be very happy to explore it further.

Thanks again for your insights, and I look forward to your thoughts!

duebauf · June 6, 2025, 6:33pm

I am having the same issue!! Is there a solution?

Miquel_Colomer · June 16, 2025, 3:19am

Hi @estsaiDev

It seems you’re hitting a core limitation: n8n’s Data Loader → AWS Bedrock (Cohere Embed) → Vector Store chain is a closed loop—you can’t insert a standard Code or Set node in the middle to transform the payload .

Why this happens

The Embeddings AWS Bedrock node expects input with keys like texts and input_type, as per Cohere API specs.

However, the Default Data Loader outputs JSON with pageContent (and metadata), which doesn’t align. When connected directly to the embedding node, the mismatch causes errors like:

required key [texts] not found
required key [input_type] not found

As you said, n8n’s UI prevents you from adding another node between Data Loader and Embedding—something design‑wise.

Workarounds to reshape the data

Use your own loader + transformation

Instead of the built-in Data Loader, create a manual pipeline:

AWS S3 node (or HTTP node) to fetch documents from your bucket.
Set or Code node to format the JSON like:

return items.map(item => ({
json: {
texts: [item.json.pageContent],
input_type: “search_document”
}
}));

Connect this output to Embeddings AWS Bedrock Embeddings AWS Bedrock node documentation | n8n Docs
Finally, send results to the Vector Store node .

This gives you full control to match the Cohere payload requirements.

Use a separate workflow for embedding

Split into two workflows for clarity and flexibility:

Main workflow: loads content from S3.

Embedding subworkflow: takes pageContent, transforms JSON in a Code node, calls Bedrock, then writes to Vector DB.

Connect with Execute Workflow node so you run the embedding logic cleanly and independently.

Use this only as a first approach on how to fix your issue after reading documentation. I didn’t tested it by myself.