Processing Multiple Document Types for LLM Input in n8n

Processing Multiple Document Types for LLM Input in n8n

Hello, everyone! I’m completely begginer in n8n.

I’m working on a workflow to process documents of various formats and feed them to an LLM model. Before I build my implementation, I’d like to get advice from the community on the best approach.

Describe the problem/error/question

I need to create a workflow that:

  1. Accepts multiple documents of different types (doc, docx, xml, xlsx, txt, html, odf, xls)
  2. Processes these documents to extract text content
  3. Combines all extracted text
  4. Sends the combined text to an LLM for analysis

My main challenge is implementing the loop to process each document and then merging all the extracted text together. I’m not sure how to structure this workflow efficiently.

What is the error message (if any)?

No specific error, I’m seeking guidance on implementation.

Share the output returned by the last node

Empty. I wish to get llm answer.

Information on your n8n setup

  • n8n version: 1.91.2
  • Database (default: SQLite): Postgres
  • n8n EXECUTIONS_PROCESS setting (default: own, main): default
  • Running n8n via (Docker, npm, n8n cloud, desktop app): docker compose starter-kit
  • Operating system: Ubuntu
1 Like

Hello @SleekVortex welcome to the community!

What you can do:

  1. Remove the SplitInBatches node
    You don’t need it here, n8n already splits files one by one using Split Out node

  2. At the end of each extractor (Word, PDF, etc.), add a Set node
    Use this to rename the extracted text into a common field, like text.

  3. Use a Merge node: this will collect all the text fields together, one per file

  4. Add a Function node to combine all texts
    Now you have everything in one block of text.

  5. Send combined text to your AI Agent


Try putting a workflow together and come back here with specific tech questions.

Cheers