Extracting and splitting HTML data

Hi,
Been following many topics here,
I am using an RSS feed, and trying to split the values inside according to the label, so far I have been able to just get the labels name using extraction of “b” and “p” by using other ideas here:


But still not able to:

  • Get the respective data after the labels as you can see by the flow below
  • and separate the “img src” and “hrefs” into separate part of the array

Is this possible to achieve? Thank you very much.

My flow

{
  "nodes": [
    {
      "parameters": {
        "url": "http://www.trumba.com/calendars/brisbane-city-council.rss"
      },
      "name": "RSS Feed Read1",
      "type": "n8n-nodes-base.rssFeedRead",
      "typeVersion": 1,
      "position": [
        280,
        770
      ],
      "notes": "Brisbane\nhttps://www.trumba.com/calendars/redland-city-council.rss"
    },
    {
      "parameters": {
        "keepOnlySet": true,
        "values": {
          "string": [
            {
              "name": "contentSnippet",
              "value": "={{$json[\"content\"]}}"
            },
            {
              "name": "Content",
              "value": "={{$json[\"contentSnippet\"]}}"
            }
          ]
        },
        "options": {}
      },
      "name": "Set4",
      "type": "n8n-nodes-base.set",
      "typeVersion": 1,
      "position": [
        570,
        770
      ]
    },
    {
      "parameters": {
        "dataPropertyName": "contentSnippet",
        "extractionValues": {
          "values": [
            {
              "key": "text1",
              "cssSelector": "b",
              "returnArray": true
            },
            {
              "key": "text",
              "cssSelector": "a"
            }
          ]
        },
        "options": {}
      },
      "name": "HTML Extract1",
      "type": "n8n-nodes-base.htmlExtract",
      "typeVersion": 1,
      "position": [
        870,
        710
      ]
    },
    {
      "parameters": {
        "dataPropertyName": "Content",
        "extractionValues": {
          "values": [
            {
              "key": "text1",
              "cssSelector": "p",
              "returnArray": true
            },
            {
              "key": "text",
              "cssSelector": "a"
            }
          ]
        },
        "options": {}
      },
      "name": "HTML Extract4",
      "type": "n8n-nodes-base.htmlExtract",
      "typeVersion": 1,
      "position": [
        870,
        890
      ]
    }
  ],
  "connections": {
    "RSS Feed Read1": {
      "main": [
        [
          {
            "node": "Set4",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Set4": {
      "main": [
        [
          {
            "node": "HTML Extract1",
            "type": "main",
            "index": 0
          },
          {
            "node": "HTML Extract4",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

I just saw i never mentioned my solution for the problem i was facing, in the end this was the solution for me:

items[0].json.myVariable = items[0].json.data.match(/Description Work.*?./s);
return items;

Hi Damian,
While this works for JSON, having hard time making work to an HTML extract, since all my data is in one “string”, I can’t seem to separate them all into an array of values.
the “p” and “b” work, but the rest doesn’t. Apologies, but I am pretty new to n8n and javascript is still a foreign language for me :slight_smile:. Thank you.

For example here would separate:

Just have https://www.trumba.com/i/DgClab3f1OO0xII3RnmSz6SG.jp

<a href

All that could give an array JSON as:
Ongoing through Sunday, April 18, 2021



Tammy Law will be Museum of Brisbane’s (MoB’s) Artist in Residence during the 2021 BrisAsia Festival. Drawing on her documentary photographic practice, Tammy’s residency will present personal stories to challenge and inspire conversations around forced migration.

MoB’s Artist in Residence program is supported by Tim Fairfax AC.

View all BrisAsia Festival events, including festival highlights, on the BrisAsia Festival page and for further information on this exhibition, please visit Museum of Brisbane’s website

VenueMuseum of Brisbane, Brisbane City
Venue addressMuseum of Brisbane, Brisbane City Hall, 64 Adelaide Street, Brisbane City
Parent event: BrisAsia Festival
Event type: Art, Culture, Featured, Festivals, Free, LIVE
Cost: Free

All separate:

So far got this:

Thanks

The data i am working with comes from the HTML Extractor :wink:

Does not work for me, I get this error: ERROR: Cannot read property ‘match’ of undefined
Tried the other similar approaches and still get the errors

Then your item does not have a property data. You have to adapt the code to match your use case. So if the match should happen on the property text1 then you would replace data with text1 and the code would then be:

items[0].json.myVariable = items[0].json.text1.match(/Description Work.*?./s);
return items;