I’m trying to build a workflow in n8n to prepare data for a RAG pipeline with Pinecone. The workflow needs to fetch HTML documents stored in Google Drive.
For one software package, all HTML files are in a single folder — that part works fine. But for another package, the documentation is spread across dozens of nested subfolders, and here’s where I’m stuck.
What I need:
Recursively go through a root folder and all its subfolders.
Collect the IDs of all HTML files I encounter.
Store these IDs in a list (or some storage outside of the loop), so that I can later pass them to a Google Drive → Download node.
The issues I’m facing:
I can iterate through the root folder and see the immediate subfolders, but I can’t manage to go deeper into nested subfolders.
Even when I try to process HTML files inside a loop, their IDs don’t get stored or aggregated outside the loop.
Has anyone built something similar? What’s the best way to configure the loops and data aggregation in n8n so that I can:
Recursively search through all subfolders.
Collect and output all HTML file IDs in one place for further processing.
You can do this in n8n by using a Function or Function Item node to handle the recursion instead of trying to manage it with Loop nodes alone. The idea is to call the Google Drive “List” endpoint inside a function, check each item, push HTML file IDs into an array, and for every folder found, call the same function again until there are no more subfolders.
Once the recursive function finishes, return the full array so the next node (your Download node) gets a single list of all collected IDs. This avoids the problem of data getting trapped inside loops and lets you walk the entire folder tree cleanly.