Not all folders in S3 listed in AWS S3 node folder get all operation?

Gowthaman_Prabhu · July 12, 2022, 2:07pm

I am using latest n8n build. Get all folders does not return all folders. It might be the case that programmatically created folders are not picked up by AWS S3 nodes get all folders operation.

Any one else face the same issue?

MutedJam · July 12, 2022, 2:20pm

Hm, this was working fine for me last time I used the node.

Can you confirm how exactly you have created your folder missing when getting all folders through n8n’s AWS S3 node? Do you have any IAM policy preventing folder access in the respective policy and can you share your node?

scottjscott · July 13, 2022, 1:24pm

I’ve got the same problem. If I create a folder in the s3 console or via the n8n node, then all is good - the Get all folders works. If I upload a folder containing files via the s3 console, then the Get all folders doesn’t see that folder.

I’ve fallen foul of this using other tools: it seems that when the folder is created as part of creating a file, it isn’t really a folder it is simply part of the path. You see evidence of this because if you delete all of the files in this type of folder, the folder disappears.

Not quite sure how to deal with this. I guess the first question for @MutedJam is whether my diagnosis is correct?

scottjscott · July 13, 2022, 1:41pm

Here’s an article that is worth a read: S3 keys are not file paths – alexwlchan

MutedJam · July 13, 2022, 1:47pm

That’s a great learn and I didn’t know about this. It would certainly explain the behaviour, though I am not sure if that’s a bug or simply how S3 works. I assume you can see these files when running Get All for the File resource?

Gowthaman_Prabhu · July 13, 2022, 1:54pm

@MutedJam Get all files works as it should. As corroborated by @scottjscott only folders act funny !

MutedJam · July 14, 2022, 10:12am

Hey all, just wanted to let you know that I was able to reproduce the behaviour. Folders created through n8n’s Folder → Create operation as well as folders created through the S3 console directly appear as expected, but folders created as part of a file upload (like folder_created_through_n8n_file_upload/BBw9PAyqSJM.jpg) are not).

I am not 100% sure this is a bug, but have added it to our internal bug tracker for a closer look to get some additional eyes on this.

scottjscott · July 14, 2022, 5:30pm

Hi @MutedJam,

I also think there might be a problem when you have multiple folders in a bucket, and the first of these buckets is empty - it effectively prevents you from running a flow like the following:

Get all folders to get a list of folders that need to be processed
Get all files for each of the folders (setting the folder key to the folder returned in step 1)

If the first folder is empty then it doesn’t proceed to the next folder.

Regards
Scott

scottjscott · July 15, 2022, 7:12am

Hi,

I’m really struggling to come up with even a workaround solution here - the behaviour of the S3 node doesn’t seem to be consistent with the general principles at work within n8n.

I will share a flow with you just as soon as I can rework the entire thing to remove all of the sensitive information, but in the meantime I’ll tell you what I’m trying to do.

I am using S3 as cloud storage for a speech & text analytics pipeline. My S3 bucket structure is setup with the following logic:

{bucket-name}/{audio-file-source}/{processing-folder}/{department-folder}/{date}/{recording.wav}

So for example, where I have an audio file source of “call-recordings” supplied by the customer service department the landing folder would look something like this:

bucket-name/call-recordings/IN/customer-service/20220715/callrecording-1.wav

but I could also have:

bucket-name/call-recordings/IN/sales-team/20220630/callrecording-2.wav

So, I have 6 different departments, providing 100s of recordings every day into their department folder, broken down by a folder for each date. Following this structure is important for the speech and text analytics pipeline.

I’ve got the entire speech and text analytics process working within n8n via the http node (transcribe / comprehend nodes are missing current features), but it is all breaking down at the S3 processing step because I cannot reliably traverse the folder structure using a series of “Get all folders” requests, leading up to a “Get all files request” in the final date folder.

The initial “Get all folders” gets me a list of folders, but the second “Get all folders” request just stops the flow when it encounters an empty folder. I’ve tried with and without split nodes and it doesn’t make any difference, it just stops in it’s tracks (even when I enable “always output data”).

As I’ve said, I’ll work up an example flow that doesn’t have the sensitive data in it.

Regards
Scott

Gowthaman_Prabhu · July 15, 2022, 7:15am

@scottjscott You can maybe try Getting all files passing folder key as a JS expression. This works as long as the files are present in the expressed folder. It works even if the folders were created as a path reference upon file creation in S3.

Jon · July 15, 2022, 7:16am

Hey @scottjscott,

While waiting for a clean version of your workflow, What about enabling “Always Output Data” then check if the response is empty if it is you can go back to the start of the loop or do something else and if it isn’t you can carry on as normal.