I am trying to automate the extraction of teaching strategies from a 45-page DOCX file (Virginia Department of Education Instructional Guide). I need to extract specific “Instructional Approaches” and map them to their “Standard Codes” in an Excel sheet.
Current Workflow:
1. File Ingest: Using the @mazixmazix/n8n-nodes-converter-documents node to get the DOCX into Markdown.
2. Chunking: Using a Code Node to split the text into 87 individual items based on section headers (Reading, Writing, etc.).
3. Looping: I am using a Loop Over Items node to process these one-by-one.
4. AI Processing: Each item goes to an AI node to extract the data into JSON.
5. Excel: Appending each result to an Excel sheet.
The Problem:
Processing 87 items sequentially takes a very long time (15+ minutes) and sometimes the loop “hangs” if the AI returns an empty result or if a connection blips.
Questions for the Community:
1. What is the most stable way to “loop” through 80+ items and write to Excel without the workflow timing out?
2. Is there a better way to “chunk” a 45-page Word document than using regex in a Code node?
3. How can I make the workflow more resilient so if item #40 fails, the rest of the 87 items still finish?
What is the error message (if any)?
Please share your workflow
(Select the nodes on your canvas and use the keyboard shortcuts CMD+C/CTRL+C and CMD+V/CTRL+V to copy and paste the workflow.)
Share the output returned by the last node
Information on your n8n setup
n8n version: 1.71.1
Database (default: SQLite): Postgres 16
n8n EXECUTIONS_PROCESS setting (default: own, main): own
Running n8n via (Docker, npm, n8n cloud, desktop app): Docker
You can always disable EXECUTIONS_TIMEOUT / EXECUTIONS_TIMEOUT_MAX.
Converting docs to markdown and then using text splitter would really be a nice solution.
For the AI Agent the best thing is Retry on Fail as continue on error would really pile up error filled data and would not be good for a scalable system.
I think using a sub-workflow here would be a strong option,
If you use it with Wait for Sub-Workflow Completion turned off, the 87 items can be executed ~ in parallel instead of sequentially,
This significantly reduces total runtime and avoids loops issues..
imo, using a sub-workflow is the most stable approach here, each execution runs independently, so a delay or failure in one item does not block the rest..
Instead of looping, your main workflow should dispatch tasks, and each item should be handled by the sub-workflow as a standalone execution..
If the document has a table of contents or consistent section/page structure, I would try chunking by pages or major sections rather than heavy regex parsing to preserve context..
i beleve this is what sub-workflows are designed for..
So at the end, your main workflow becomes the orchestration layer that distributes tasks, while the sub-workflow handles the logic for processing a single item..
Because each item runs independently:
A failure in item #40 won’t affect items #41–#87
Errors can be handled and logged per item
Retries and error management become much easier and more effective
In this architecture, you gain both resilience and scalability..
I am needing AI to break this up by standard code, standard definition, the teacher notes, considerations, and instructional approach. With how large the document is I have had a hard time trying to figure out the best approach to chunking the document to get everything that I need from my AI Model.
Hey @Jonathan_Wilson, a friend of mine worked on a similar workflow where he had to process huge order confirmation documents and the workflow would also end up timing out the whole time.
He had the documents coming in as PDFs and then used PDF lib (https://www.npmjs.com/package/n8n-nodes-pdf-lib) to split the documents up in chunks. He then also added multiple extraction nodes for different sections (header, footer, summary).
@Jonathan_Wilson Using Vector embed documents would be a better take if the document is not going to get updated frequently, but if it is not static consider turning that PDF into markdown and then using it, that way you would not HIT API limits and AI agent timeouts.