Downloading a poorly formatted spreadsheet and converting to JSON

Hi all, noob here. I’ve been banging my head against the wall trying to uncover what I thought to be a simple problem.

I get a monthly report (xlsx) sent to my Gmail. I want to convert that report to JSON for later manipulation.

First problem, Gmail has two attachments (because of a signature picture). I tried to strip that out using code, but whenever I mess with the binary using JS the “Extract from File” node fails with The first argument must be of type string or an instance of Buffer, ArrayBuffer, or Array or an Array-like Object. Received undefined

I created a “Split Out” & “Filter” workaround, but now I’m finding that the excel file has extra header data that I need to get rid of (Actual headers start on row 5). The “Extract from Sheet” splits the whole sheet into 28 separate items, but they are not looking like I need them to, but I lack the coding to be able to fix this problem (and ChatGPT is not helping me).

Current output:

[
  { row: [null, null, "Monthly Water Utility Charge Report"] },
  { row: [null, "Property: Wolf Creek Village 2586"] },
  ...
  { row: ["UnitList", "Account Number", "Name", "Consumption", "TotalCharged"] },
  { row: ["108", null, "Abe Aboleth", 669.3, 6.3] },
  ...
]

Desired output:

{
  "UnitList": "108",
  "Name": "Abe Aboleth",
  "Consumption": 669.3,
  "TotalCharged": 6.3
}

Is the “Split out/Filter” combo the best way to handle this? Can anyone help me on how to get my JSON fixed?

Also, I think the code thing is a bug, I figure someone who is better at this than I can tell me if that is true or not.

Hi @Guyanthalas Welcome to the community :tada:

If I understood you correctly, “The Extract from Sheet” splits the whole sheet into 28 separate items, so I created an “Aggregate” node then a “Code” node.

The code node will extract the row after "UnitList"

Could you please check this workflow after editing the aggregate node “Input Field Name”?

1 Like

The aggregate function worked a treat! That helped a lot with figuring out how to deal with the Code node.
Your provided code worked for keeping the data in rows, but I needed the JSON to be more strucutred. This is the code that worked out to do that:

const output = [];

for (const item of $input.all()) {
  const allRows = item.json.row;

  // Find the header row (the one containing "UnitList")
  const headerIndex = allRows.findIndex(row =>
    Array.isArray(row) && row.includes("UnitList")
  );

  if (headerIndex === -1) {
    throw new Error("Header row with 'UnitList' not found.");
  }

  const header = allRows[headerIndex];
  const dataRows = allRows.slice(headerIndex + 1);

  // Convert each data row to an object
  for (const row of dataRows) {
    if (!Array.isArray(row) || row.every(cell => cell == null || cell === "")) {
      continue; // Skip empty or malformed rows
    }

    const obj = {};
    header.forEach((key, i) => {
      obj[key] = row[i];
    });

    output.push({ json: obj });
  }
}

return output;

Thank you so much for your assistance! It is greatly appreciated.

1 Like