How to extract data multiple times from single file

Describe the problem/error/question

So I’m trying to extract different things from a list of excel files. I need a string from a particular cell, then from the last cell, etc…

So what’s the N8Nian way for this kind of thing: read a file once, and accessing it multiple times and saving bits to Set fields, so we can use them later on?

I was thinking of Read File → Extract 1 → Set field → Extract2 → Set Field-> Extract 3… etc.

This is not working… only the first Extract from File node has access to the binary data. All others complain with the error “This operation expects the node’s input data to contain a binary file ‘data’, but none was found [item 0]”. Even if I use the option “Input Fields to Include”.

I know I can branch out from the initial Read File node, but then I get into the issue of loosing access to each item. That is to say, a Set Field for File #1 is not accessible down the line.

What is the error message (if any)?

This operation expects the node’s input data to contain a binary file ‘data’, but none was found [item 0]

Please share your workflow

Information on your n8n setup

  • n8n version: 1.91.3
  • Database (default: SQLite): pgsql
  • n8n EXECUTIONS_PROCESS setting (default: own, main): default
  • Running n8n via (Docker, npm, n8n cloud, desktop app): docker
  • Operating system: ubuntu

Hi, just by looking at it. It could be shortened down a lot. N8N already loops automatically for the amount of items.

As you are self-hosting you could download xlsx package and use a single code node do everything after the read/write files node.

If you want to keep it modular. You could just have 1 extract from file and then referene the data in the other nodes.

Hey thanks for your pointers. The excel files are not very well structered. There are blocks like a header, and another block that is the table I need to import later on to postgresql.

That’s why I’m trying to read in pieces. A range to get the header, a predefined cell to get an id, and then another range to get a clean table.

I could read the entire file once, and then parse all those things, including the table. But I thought Id just use N8N builtin capabilities of the extract node.

The looping could maybe not be done, but that’s something I can look into later. The issue is how can I refer to the same binary file later in the workflow?

Thanks again.

How many excel files are you reading?
And are these excel file cells cross referencing each other?

Maybe 10 files. There are no cross references.

Hi

Here is my quick thought about your requirement.

Split the path and merge them after.

I test some random sheet in google drive using download seems work. Not sure about your excel file.

1 Like

Do you have an example of one of these excel files? Technically you should just read the excel file once and then simply ready the different cells you require

So this is an example:

I wanted to get some data from the header, and later read the table from A10:E14. That table will then be ingested into postgresql.

Based on @darrell_tw suggestion, I completed the correct version based on the excel data you provided.

Since I dont understand your language I went and created a similar excel:

Which produces this result:

Header data extracted from fixed prositions, and table data converted into a JSON object. This can now be used to push into a db.

2 Likes

Hey thanks both @darrell_tw and @Wouter_Nigrini . I never considered the aggregate node before. Code Nodes, I try to avoid if possible.

I had something like this, but after the loop when inserting into postgresql I was having trouble getting the set variables from the loop… let’s say using the date in the sql query that sends the table to postgresql.

Let me look into it again. Ill let you know if it worked with the aggregate node.

ok, so it seems I oversimplified my onw use case…

what if the range for the table depends on a header? So we need to first read a header to define the range for the table, because we have 2 types of excel…

So we do need to 1st extract the type of file from the header, then set the range accordingly, and then re-read the file to get the table from the correct range…