Extracting text in between HTML

Hi,
I am trying to just keep text between HTML tags

From this: <a href=http:" target="_blank" rel=“noopener”>Mount Cotton Hillclimb
47-87 Gramzow Road
Mount Cotton

I am trying to just keep: Mount Cotton Hillclimb
47-87 Gramzow Road
Mount Cotton

As a start I tried using items[0].json.myVariable = items[0].json.location.match(/noopener">.*?./s) to at least keep the right side and using a replace I would get rid of the rest, but the " in the match are not making it work. Is there a simpler way for this? Thank you very much.

{
  "nodes": [
    {
      "parameters": {
        "functionCode": "return [\n{ \njson: \n{\n\"location\": \"<a href=http:\" target=\"_blank\" rel=\"noopener\">Mount Cotton Hillclimb<br />47-87 Gramzow Road<br />Mount Cotton</a>\"\n}\n}]\n"
      },
      "name": "json",
      "type": "n8n-nodes-base.function",
      "typeVersion": 1,
      "position": [
        940,
        1510
      ]
    },
    {
      "parameters": {
        "functionCode": "items[0].json.myVariable = items[0].json.location.match(/noopener.*?./s);\nreturn items;\n"
      },
      "name": "Extract",
      "type": "n8n-nodes-base.function",
      "typeVersion": 1,
      "position": [
        1130,
        1510
      ]
    }
  ],
  "connections": {
    "json": {
      "main": [
        [
          {
            "node": "Extract",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

Check the example below:

{
  "nodes": [
    {
      "parameters": {},
      "name": "Start",
      "type": "n8n-nodes-base.start",
      "typeVersion": 1,
      "position": [
        250,
        300
      ]
    },
    {
      "parameters": {
        "functionCode": "return [\n{ \njson: \n{\n\"location\": `<a href=http:\" target=\"_blank\" rel=\"noopener\">Mount Cotton Hillclimb<br />47-87 Gramzow Road<br />Mount Cotton</a>`\n}\n}]\n"
      },
      "name": "json",
      "type": "n8n-nodes-base.function",
      "typeVersion": 1,
      "position": [
        540,
        300
      ]
    },
    {
      "parameters": {
        "functionCode": "const content = item.location.match(/<a.*?>(.*)<\\/a>/)\nitem.location = content[1]\nreturn item"
      },
      "name": "FunctionItem",
      "type": "n8n-nodes-base.functionItem",
      "typeVersion": 1,
      "position": [
        800,
        300
      ]
    }
  ],
  "connections": {
    "Start": {
      "main": [
        [
          {
            "node": "json",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "json": {
      "main": [
        [
          {
            "node": "FunctionItem",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}
1 Like

Thank you Ricardo, but I am getting this error:

ERROR: Cannot read property ‘1’ of null

TypeError: Cannot read property '1' of null
    at /usr/local/lib/node_modules/n8n/node_modules/n8n-nodes-base/dist/nodes:2:24
    at Object.<anonymous> (/usr/local/lib/node_modules/n8n/node_modules/n8n-nodes-base/dist/nodes:3:13)
    at NodeVM.run (/usr/local/lib/node_modules/n8n/node_modules/vm2/lib/main.js:1167:29)
    at Object.execute (/usr/local/lib/node_modules/n8n/node_modules/n8n-nodes-base/dist/nodes/FunctionItem.node.js:77:37)
    at Workflow.runNode (/usr/local/lib/node_modules/n8n/node_modules/n8n-workflow/dist/src/Workflow.js:492:37)
    at /usr/local/lib/node_modules/n8n/node_modules/n8n-core/dist/src/WorkflowExecute.js:416:62

Hey @Jorge_M!

Did you try using the HTML Extract node? Below is the workflow that might help :slight_smile:

{
  "nodes": [
    {
      "parameters": {
        "functionCode": "return [\n{ \njson: \n{\n\"location\": `<a href=http:\" target=\"_blank\" rel=\"noopener\">Mount Cotton Hillclimb<br />47-87 Gramzow Road<br />Mount Cotton</a>`\n}\n}]\n"
      },
      "name": "json1",
      "type": "n8n-nodes-base.function",
      "typeVersion": 1,
      "position": [
        570,
        450
      ]
    },
    {
      "parameters": {
        "dataPropertyName": "location",
        "extractionValues": {
          "values": [
            {
              "key": "text",
              "cssSelector": "a"
            }
          ]
        },
        "options": {}
      },
      "name": "HTML Extract",
      "type": "n8n-nodes-base.htmlExtract",
      "typeVersion": 1,
      "position": [
        810,
        450
      ]
    }
  ],
  "connections": {
    "json1": {
      "main": [
        [
          {
            "node": "HTML Extract",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}
2 Likes

Ahhh brain fart @harshil1712 , my bad. I did forget that part, now it worked perfect :smiley: . 10000 Thanks.

1 Like

I am glad that it works! Have fun :slight_smile:

1 Like