Universal Invoice Extraction: How to Read All Sheets from an XLS/XLSX File?

Hello everyone,

I’m building a workflow for automated invoice data extraction from various file formats and into a database (starting with a spreadsheet). I’m aiming for a solution that is as universal as possible.

Project Overview

My workflow needs to process invoices in various formats, including XLS/XLSX, and prepare the data for an AI agent for final processing.

Problem / Edge Case

I’ve encountered an issue with Excel invoices that have multiple sheets:

  • The built-in **Extract from File (XLS)**node only reads the data from the first sheet.

  • In one case, the first sheet was hidden and contained ‘garbage’ data that I do not need. I discovered this sheet manually.

  • To create a truly universal workflow, I need to account for this possibility of hidden or unwanted data on the first sheet.

Desired Outcome

My goal is to:

  1. Read the raw text/data from all sheets in the Excel file (XLS/XLSX).

  2. Aggregate all extracted text into a single field called text.

  3. Pass this aggregated text field to a downstream AI agent for further processing.

Question

Can you advise a clean, reliable n8n solution for reading and aggregating data from all sheets within a single XLS/XLSX file?

I am also considering an alternative:

  • Converting all sheets of the Excel file into a single PDF file and then sending that file through my existing PDF parser route. Is this a viable and more stable route?

System Setup

  • n8n Version: 1.116.2

  • Installation: Local via Docker

  • Operating System: Ubuntu

Thanks in advance for your insights!

Hey! I’ve built similar universal invoice workflows. Here’s what’s worked:

For multi-sheet Excel:

The cleanest approach is using the Spreadsheet File node with “Read as Raw Data”:

  1. Set to read all sheets
  2. Loop through each sheet
  3. Aggregate into single text field
  4. Pass to AI agent

However, honestly - your PDF conversion idea is actually MORE reliable for universal processing:

Why PDF route is better:

  • Consistent format regardless of source (XLS, XLSX, CSV)
  • Hidden sheets become visible
  • Tables/formatting preserved better
  • Easier to handle edge cases (merged cells, formulas)
  • One parsing path instead of multiple format handlers

Quick implementation:

  1. Convert Excel → PDF (LibreOffice via Docker or CloudConvert API)
  2. Parse PDF with structure-preserving extraction
  3. AI agent processes consistent markdown format

For invoice extraction specifically, the PDF route gives you:

  • Same workflow for Excel, Word, and PDF invoices
  • Better handling of complex layouts
  • Consistent output format for AI

I use this approach for 8+ clients processing mixed invoice formats. The “convert everything to PDF first” strategy simplified maintenance significantly.

What invoice formats are you seeing most? That might determine best path forward.