AWS S3 node: read folders and files within a path

Hi there,
I’m building a workflow to push S3 logs into a Postgres, those logs are originated by another system, compressed and uploaded to S3 using this pattern: s3://bucket/YYYY/MM/DD/file.gz

Example: s3://tracking-data/web/2021/01/15/1610741019847.gz

I couldn’t find a way to specify which folder to list get All neither download a specific file, is there a way to specify files and or folders for GetAll operations?

If I point to one specific file on S3 root directory, the workflow works perfectly, but log files comes on partitions. BTW, thanks for the Compression node, it solved de .gz part neatly :slight_smile:

Thanks

{
  "nodes": [
    {
      "parameters": {},
      "name": "Start",
      "type": "n8n-nodes-base.start",
      "typeVersion": 1,
      "position": [
        40,
        300
      ]
    },
    {
      "parameters": {
        "functionCode": "return items.flatMap(function(item) {\n  const b64value = item.binary['file_0'].data;\n  const lines = Buffer.from(b64value, 'base64').toString().trim();\n  \n  return lines.split('\\n').map(function(el) {\n      const record = JSON.parse(el);\n      return {json: record}\n  })\n})\n"
      },
      "name": "Log to JSON",
      "type": "n8n-nodes-base.function",
      "typeVersion": 1,
      "position": [
        860,
        300
      ]
    },
    {
      "parameters": {
        "batchSize": 1,
        "options": {}
      },
      "name": "SplitInBatches",
      "type": "n8n-nodes-base.splitInBatches",
      "typeVersion": 1,
      "position": [
        660,
        300
      ]
    },
    {
      "parameters": {
        "keepOnlySet": true,
        "values": {
          "string": [
            {
              "name": "userId",
              "value": "={{$json[\"context\"][\"traits\"][\"userId\"] || null}}"
            },
            {
              "name": "carrier",
              "value": "={{$json[\"context\"][\"network\"][\"carrier\"]}}"
            },
            {
              "name": "device",
              "value": "={{$json[\"context\"][\"device\"][\"manufacturer\"]}} {{$json[\"context\"][\"device\"][\"model\"]}}"
            },
            {
              "name": "os",
              "value": "={{$json[\"context\"][\"os\"][\"name\"]}} {{$json[\"context\"][\"os\"][\"version\"]}}"
            }
          ]
        },
        "options": {}
      },
      "name": "Set",
      "type": "n8n-nodes-base.set",
      "typeVersion": 1,
      "position": [
        1050,
        300
      ]
    },
    {
      "parameters": {
        "table": "test",
        "columns": "userId,carrier,device,os"
      },
      "name": "Save to DW",
      "type": "n8n-nodes-base.postgres",
      "typeVersion": 1,
      "position": [
        1180,
        490
      ],
      "credentials": {
        "postgres": "n8n-pg"
      },
      "disabled": true
    },
    {
      "parameters": {},
      "name": "Compression",
      "type": "n8n-nodes-base.compression",
      "typeVersion": 1,
      "position": [
        450,
        300
      ]
    },
    {
      "parameters": {
        "bucketName": "segment-tracking-data-sandbox",
        "fileKey": "1610741019847.b3282638195e.fde281a.e5a0b38d-df4a-471e-af63-19dad80bfdcb.gz"
      },
      "name": "AWS S3",
      "type": "n8n-nodes-base.awsS3",
      "typeVersion": 1,
      "position": [
        230,
        300
      ],
      "credentials": {
        "aws": "segment-tracking-data-sandbox-dev"
      }
    }
  ],
  "connections": {
    "Start": {
      "main": [
        [
          {
            "node": "AWS S3",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Log to JSON": {
      "main": [
        [
          {
            "node": "Set",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "SplitInBatches": {
      "main": [
        [
          {
            "node": "Log to JSON",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Set": {
      "main": [
        [
          {
            "node": "Save to DW",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Save to DW": {
      "main": [
        [
          {
            "node": "SplitInBatches",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Compression": {
      "main": [
        [
          {
            "node": "SplitInBatches",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "AWS S3": {
      "main": [
        [
          {
            "node": "Compression",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

I couldn’t find a way to specify which folder to list get All neither download a specific file, is there a way to specify files and or folders for GetAll operations?

Use the function file:getAll with the Folder Key option. For example, My bucket has the following structure s3://n8n/folder1/folder2/1.png. If I wanted to list all the files within folder2, the Bucket Name would be n8n and the Folder Key would be folder1/folder2/. The response will include a Key property. Use that key property to download the file using the function file:download.

Also, just saw the workflow and do not think the Split Batches node there is needed. Not sure what you want to do thought.

Thanks @RicardoE105! I’ll research further on my end, because the objects I uploaded using aws s3 cli aren’t visible to this node, just those I’ve created using the s3 node.

I’ll get back when I understand why this is happening, to help let the solution properly documented.

Yes, it doesn’t make sense on this current workflow I added it when I had several files on the pipeline and because the log files can be large, n8n was throwing memory exceptions.

I intend to publish my final workflow, it may be useful to others, once it is a pretty generic use case.

3 Likes