Universal Invoice Extraction: How to Read All Sheets from an XLS/XLSX File?

Hello everyone,

I’m building a workflow for automated invoice data extraction from various file formats and into a database (starting with a spreadsheet). I’m aiming for a solution that is as universal as possible.

Project Overview

My workflow needs to process invoices in various formats, including XLS/XLSX, and prepare the data for an AI agent for final processing.

Problem / Edge Case

I’ve encountered an issue with Excel invoices that have multiple sheets:

  • The built-in **Extract from File (XLS)**node only reads the data from the first sheet.

  • In one case, the first sheet was hidden and contained ‘garbage’ data that I do not need. I discovered this sheet manually.

  • To create a truly universal workflow, I need to account for this possibility of hidden or unwanted data on the first sheet.

Desired Outcome

My goal is to:

  1. Read the raw text/data from all sheets in the Excel file (XLS/XLSX).

  2. Aggregate all extracted text into a single field called text.

  3. Pass this aggregated text field to a downstream AI agent for further processing.

Question

Can you advise a clean, reliable n8n solution for reading and aggregating data from all sheets within a single XLS/XLSX file?

I am also considering an alternative:

  • Converting all sheets of the Excel file into a single PDF file and then sending that file through my existing PDF parser route. Is this a viable and more stable route?

System Setup

  • n8n Version: 1.116.2

  • Installation: Local via Docker

  • Operating System: Ubuntu

Thanks in advance for your insights!

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.