Best Practices for Handling Large Excel Files on OneDrive with a Self-Hosted Instance?

Best Practices for Handling Large Excel Files on OneDrive with a Self-Hosted Instance?

Hello everyone,

I'm looking for some guidance on the best practices for handling a common but challenging scenario: running an incremental update workflow that involves a large Excel file hosted on OneDrive for Business.

My Goal:

I have a workflow that needs to run several times a day. The logic is:

  1. Determine the timestamp of the last successful run.
  2. Query a Microsoft SQL Server database to get only the new records created since that timestamp.
  3. Process these new records (grouping, filtering, etc.).
  4. Add these new processed records to a master Excel file on OneDrive, preserving all the historical data.

The Problem:

My master Excel file is already large (~7-10 MB) and will continue to grow. I'm constantly running into Microsoft Graph API limitations, and I'm struggling to find a robust pattern that doesn't fail.

I have tried using the Microsoft Excel node, but any operation that needs to read the file's content (like Get Rows or even trying to get a list of the Tables in the sheet) fails with various errors, most commonly:

  • 400 - ResponsePayloadSizeLimitExceeded
  • 404 - Not Found when trying to select a Table, likely due to a timeout while parsing the large file.
  • Conversion failed when converting date/time... when passing date strings that the API doesn't like.

I understand that the recommended approach for large files is to avoid the Microsoft Excel node and instead use the OneDrive node (to Download/Upload) in combination with the Extract from File / Convert to File nodes.

My Question:

Given that I am on a self-hosted n8n instance and will always be working with large datasets, what is the definitive, community-approved "best practice" for this incremental update loop?

Specifically:

  • For Reading/Getting the Timestamp: Should I use the OneDrive (Get) operation to fetch the file's lastModifiedDateTime property? Or is it better to Download the entire file and use Extract from File to read a specific column and find the max date?
  • For Writing/Appending: Since I can't use the Microsoft Excel (Append Rows) node due to the file size, is the correct pattern to always:
    1. Download the full original file.
    2. Read it with Extract from File.
    3. Get new data from SQL.
    4. Merge the old and new data together inside the workflow.
    5. Use Convert to File to create a new, complete Excel file.
    6. Use OneDrive (Upload) to overwrite the old file.

This "read-all, merge, write-all" pattern seems like the most robust way to avoid the API errors, but I want to confirm if this is the standard approach or if there's a more efficient way to handle this on a self-hosted instance.

My Setup:

n8n Version: 1.110.1
Hosting: Self-Hosted
Database: Microsoft SQL Server
File Storage: OneDrive for Business (SharePoint)

Thanks so much for sharing your expertise and any example workflows you might have!

hello @Ruben_G

Best practice is not to use Excel for anything larger a cople of MBs. Better to use databases for that. Even CSV is better, because there you can quite easily append content.

Additionally, Excel has a significant limitation - it can store a maximum of 1 million rows on a sheet.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.