Strategy for processing large XML (174MB)

Hi everyone,

We are running into a deadlock situation trying to process a large Product Feed (XML, ~174MB) on n8n Cloud. We are hoping someone has a clever workaround for processing large files when the source server behaves aggressively regarding compression.

The Context:

  • Environment: n8n Cloud (v1.121.3).

  • Source: A client’s product feed (hosted on Channable/Cloudflare).

  • File Size: 174MB XML.

  • Goal: Split the XML into items and process them in batches with AI.

The Blockers:

1. “Download All” leads to OOM Since we are on n8n Cloud, we hit memory limits. We are running in binaryDataMode: filesystem, so downloading the file is fine. However, passing this 174MB file to the XML to JSON node causes an immediate Out of Memory (OOM) crash, presumably because it tries to build the entire JSON object in RAM before splitting.

2. “Streaming/Chunking” fails due to Forced GZIP We attempted to build a “Manual Chunking” workflow using the HTTP Request node with Range headers (e.g., bytes=0-100000).

  • The problem: The source server (Cloudflare) ignores Accept-Encoding: identity. It forces a Content-Encoding: gzip response even for partial content.

  • The result: We receive a partial chunk of a GZIP stream. Since we don’t have the file header/dictionary for the middle chunks, n8n cannot decompress them (error: unknown compression method or just garbage characters).

3. “Manual Streaming” in Code Node is restricted We tried writing a Code Node to stream the binary data from disk using await this.helpers.getBinaryStream(0, 'data') to manually parse/slice the XML without loading it all into RAM.

  • The problem: We get Error: The function "helpers.getBinaryStream" is not supported in the Code Node (likely a Cloud restriction).

  • Using getBinaryDataBuffer on the full file also causes OOM.

The Question: Is there any way in n8n Cloud to: A) Stream/Parse a large XML file from disk (filesystem) line-by-line without loading the whole structure into JSON first? B) Successfully handle a forced GZIP response on an HTTP Range request?

We are stuck between OOM on one side and GZIP corruption on the other. Any pointers would be greatly appreciated!

Thanks!

Hi @David_Drake ,

I’ll try to help you. I find this question to be really interesting. This is a tricky situation indeed. On top of that, the code nodes on n8n cloud do have limitations for security reasons as well. Any hacks I have in mind wouldn’t quite work. Why not use some kind of bridge to resolve this blockage– e.g., Azure function or AWS Lambda to deal with ingesting the troublesome XML. It could then be placed on Google Drive or somewhere else where streaming/chunking should then not be a problem with n8n. You can trigger Lambda to do that heavy lifting and when it’s done, have it hit a callback so that the n8n workflow knows that it can continue to chug along (with xml in google drive now and no cloudflare forced gzip issues).

This is something I would tend to go with– maybe because I’m a big fan of Azure Functions and AWS Lambda.

Khem

1 Like

Hi Khem,

Thanks for the insight! That makes total sense. It confirms our suspicion that we are hitting a hard infrastructure limit on the Cloud plan rather than just a logic error.

Using an intermediate “bridge” to handle the fetch-and-decompress step seems like the most robust architecture to avoid the GZIP/Memory deadlock. We will likely look into setting that up to ensure scalability.

Really appreciate you confirming that “hacks” inside the Code Node won’t fly here — saved us a lot of debugging time!

1 Like

@David_Drake , let me know how it goes