Parse formatted text from spreadsheet into relational database

I have an xls file in my NextCloud that has a tab of product data. I created it for our developers who are working on an ERP for us. I put all the options and attributes for the products in a single column, nested together for readability and following a specific, predictable format. I want to make it easy to get this relational data out of these cells and parse it into multiple MySQL database tables either directly or through our API.

My workflow so far is downloading the NextCloud file to n8n, then using the Spreadsheet File node to read the data. From there I can easily process the whole table into the products database table, but I’m not sure how to split the text in the options column and send it to the appropriate related attribute tables. Here is an idea of what the text looks like in that column:

Siding: LP Smartside (outdoor paint, custom paint $62.85), T1 - 11 (stain);
Trim: LP Smartside (outdoor paint, custom paint 20.95);
Roof: TuffRib Panels (standard, copper 104.75, galvalume), 5V-Crimp Panels 209.50 (standard, copper 104.75, galvalume);
Doors: 13"x30" chicken door, 24"x48" back man door, 24"x69" man door;
Windows: (2) 18"x23";
Exterior: 2’ extension (500);
Electrical: Louvered Fan with Thermostat ($355), 1 light, 1 receptacle and 1 switch (215), Automatic Door Opener (310), Solar Powered Auto Door Opener (510);
Insulation: Heat Shield;
Misc: FRP Flooring, Wire Screen for windows (20), Anchors - uninstalled (50);

I haven’t fully mapped out exactly what the tables and columns will necessarily be, but lets say the Siding, Trim, etc, are all Features. The LP Smartside and TuffRib Panels are Options related to those features and inside the parentheses (outdoors paint…) are Attributes for those Options. Some of the Options have prices (they are add-ons) and others are choices that are included in the base price. A starting example of a JSON parsing might look something like this:

{
    "Products": [
      {
        "product_id": 1,
        "product_name": "4x8 Chicken Coop - Self Ranging",
        "sku": "CCAR48",
        "base_price": 2095,
        "features": [
          {
            "feature_id": 1,
            "feature_name": "Siding",
            "options": [
              {
                "option_id": 1,
                "option_name": "LP Smartside",
                "attributes": [
                  {"attribute_id": 1, "attribute_name": "outdoor paint", "price": null},
                  {"attribute_id": 2, "attribute_name": "custom paint", "price": "$62.85"}
                ]
              },
              {
                "option_id": 2,
                "option_name": "T1 - 11",
                "attributes": [
                  {"attribute_id": 3, "attribute_name": "stain", "price": null}
                ]
              }
            ]
          }
        ]
      }
    ],
    "Attributes": [
      {"attribute_id": 1, "attribute_name": "outdoor paint"},
      {"attribute_id": 2, "attribute_name": "custom paint"},
      {"attribute_id": 3, "attribute_name": "stain"}
    ]
  }

I’ve looked at the splitting options that I could find, but I haven’t seen what I’m looking for so far. I can see the method to break this down, though. All the Features are preceded by a colon. All the Options are following the colon, comma delimited and outside the parenthesis. All the Attributes are inside parentheses and comma delimited. I could change the delimiting or formatting if need be. Can anyone provide me with some guidance on how to proceed?

Information on your n8n setup

  • **n8n version:1.20.0
  • **Database (default: SQLite)
  • **Running n8n via (Docker self-hosted)
  • **Operating system: Ubuntu

It looks like your topic is missing some important information. Could you provide the following if applicable.

  • n8n version:
  • Database (default: SQLite):
  • n8n EXECUTIONS_PROCESS setting (default: own, main):
  • Running n8n via (Docker, npm, n8n cloud, desktop app):
  • Operating system:

Does anyone have any suggestions?

Hi @omrto, welcome to the community!

It looks like the relevant details here live in the features field of each item in Products. Is this the data structure you currently have after processing your spreadsheet? If so, I think you could first split up the data from your Products, then features, features.options, and finally ['features.options'].attributes.

If needed you could then create individual columns based on the options. Here’s an example workflow:

With your example data it would produce a result like this:

Is this what you had in mind? If not, can you share an example of the JSON data you currently have after reading your spreadsheet file and what you would like to see instead?

1 Like

Thanks for your reply MutedJam. Actually, the data in the spreadsheet is not broken down into JSON yet. I need to parse it into JSON data that resembles the sample JSON I posted. From there I think your example would likely help further split it up so I can push it into the proper places in the database.