Reading Decompressed Files on n8n Cloud

Describe the problem/error/question

I’m trying to take a sitemap_index.xml file, download each of the gzipped references in that file, decompress them, then turn them into tabular data for uploading to a spreadsheet or some system like that.

We are working on n8n cloud so there is a bit of ambiguousness on what applies in the documentation. I’m able to get the gzipped files downloaded and at that step it seems straightforward that I have easy access to the binary data. However, after the decompression step, things get murky. Inspecting the objects and the instructions of the read binary file step makes it difficult to understand what is expected.

Things I’ve tried:

  • reading the files with patterns like *.xml, **/*.xml
  • referencing non-cloud examples of people reading files and trying to use similar paths as they do
  • reading the files by the filename on the binary objects output from the decompression step

I just don’t understand why:

  • the decompression step has no direct reference to the files decompressed
  • the read binary file node has no usable output when you execute it
  • the behavior of these nodes just don’t seem aligned with the behavior of every other type of n8n node i’ve used for getting or manipulating data. IMO, each node that outputs something should have everything needed to use that data in one of the likely nodes you’d use as a next step
  • there is no documentation specific to the cloud service about these aspects

Please share your workflow

Share the output returned by the last node

There is zero output, which is the problem. How does the read binary files node have no useful information that comes out of it? Why isn’t there some kind of file browser type component if we are really having to read the files from some kind of disk?

Information on your n8n setup

  • n8n version: 1.7.1
  • Database (default: SQLite): NA
  • n8n EXECUTIONS_PROCESS setting (default: own, main):
  • Running n8n via (Docker, npm, n8n cloud, desktop app): cloud
  • Operating system: NA

Hey @dn_nroth,

I can see why there is some confusion on this one, The Read Binary Files node is only needed if you have written the binary data to disk or are reading it locally from another source, If you have the binary items in your worklfow already you can use the Convert to / from node to convert the binary data to json data then work with it from there. We don’t tend to write the files to disk and will keep them in memory or cached on disk while the workflow runs so that it is availble to be used in the workflow without having to add in extra steps.

The workflow below should do the job for you, Let me know how you get on with it.

This is where I was having trouble. When I downloaded the sitemap files, i could see a reference to the binary data in the items output from that step (the binary data property). However, when I decompressed them, it seemed like the output data from the decompression node was different. I no longer saw a reference to the binary data to pass through. I only saw references and language about files and filenames, which makes me think I need to read it.

I kind of just expected from the decompression step that like the step before there would be a binary data property on the output of that node. The reference to the file_0 property you used in the next step is still kind of confusing to me.

Thanks for the update, this does look to get me moving in the right direction.