Hello everyone,
I’m building a workflow for automated invoice data extraction from various file formats and into a database (starting with a spreadsheet). I’m aiming for a solution that is as universal as possible.
Project Overview
My workflow needs to process invoices in various formats, including XLS/XLSX, and prepare the data for an AI agent for final processing.
Problem / Edge Case
I’ve encountered an issue with Excel invoices that have multiple sheets:
-
The built-in **
Extract from File (XLS)**node only reads the data from the first sheet. -
In one case, the first sheet was hidden and contained ‘garbage’ data that I do not need. I discovered this sheet manually.
-
To create a truly universal workflow, I need to account for this possibility of hidden or unwanted data on the first sheet.
Desired Outcome
My goal is to:
-
Read the raw text/data from all sheets in the Excel file (XLS/XLSX).
-
Aggregate all extracted text into a single field called
text. -
Pass this aggregated
textfield to a downstream AI agent for further processing.
Question
Can you advise a clean, reliable n8n solution for reading and aggregating data from all sheets within a single XLS/XLSX file?
I am also considering an alternative:
- Converting all sheets of the Excel file into a single PDF file and then sending that file through my existing PDF parser route. Is this a viable and more stable route?
System Setup
-
n8n Version: 1.116.2
-
Installation: Local via Docker
-
Operating System: Ubuntu
Thanks in advance for your insights!