HTTP Request Javascript

Hello,

Is there a way to have javascript enabled for the HTTP Request node because the page i need to access/scrape needs javascript to function

Thank you in advance

So this would basically be Selenium Based scraping

This is not possible @Damian_K.

Http request only gets body contents without loading external resources or enabling js.

Hey @Damian_K!

Welcome to the community! :slightly_smiling_face:

Sadly this is not possible. However, if you’re workflow for scraping is not complex you can use the HTTP Request node to fetch the data, and then use the HTML Extract node to filter out the data.

1 Like

I found out that the data needed is either loaded with JS into a HTML Table and/or stored as a Javascript JSON Property

Suchh as var ticketData = {"rows": [

And i’ve not seen an option yet to extract that with the HTML Extracter

If the data that you is returned is in the JSON format then you can use the Function node to filter it out. The HTML Extract node will only filter the data if it is returned in HTML.

Allright so i’m gonna have to filter out that specific part of the page with javascript using the function node if i understand it correctly

This is how the flow can be: HTTP Request node (to make the HTTP Request) -> Function node (to extract the required data).

The problem is, The Data returned is not JSON, It’s JSON inside a javascript variable in the source code of a page

Can you share the workflow or the output? If it contains sensitive information, you can DM me instead.

I cant seem to find the button to pm you here

Hey @Damian_K!

I tried a few solutions and this is what I have come up with. I am using the HTML Extract node to get all the script tags. Then I am using the Function node to only return the information that I need. You can use this workflow and build on top of it.

{
  "nodes": [
    {
      "parameters": {
        "url": "",
        "responseFormat": "string",
        "options": {}
      },
      "name": "HTTP Request",
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 1,
      "position": [
        450,
        300
      ]
    },
    {
      "parameters": {
        "extractionValues": {
          "values": [
            {
              "key": "script",
              "cssSelector": "script",
              "returnValue": "html",
              "returnArray": true
            }
          ]
        },
        "options": {}
      },
      "name": "HTML Extract",
      "type": "n8n-nodes-base.htmlExtract",
      "typeVersion": 1,
      "position": [
        650,
        300
      ]
    },
    {
      "parameters": {
        "functionCode": "const funcRegex = /var ticketData/;\nconst scripts = items[0].json.script;\nlet data = []\nfor(item in scripts){\n  if(funcRegex.exec(scripts[item])){\n    data.push(scripts[item]);\n  }\n}\nreturn [{json:data}]\n"
      },
      "name": "Function",
      "type": "n8n-nodes-base.function",
      "typeVersion": 1,
      "position": [
        850,
        300
      ]
    }
  ],
  "connections": {
    "HTTP Request": {
      "main": [
        [
          {
            "node": "HTML Extract",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "HTML Extract": {
      "main": [
        [
          {
            "node": "Function",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

I hope this helps :slightly_smiling_face:

Have you tried it out? because it doesnt work for me as in, the function node does not only return the ticketdata and thus it does not see JSON

Allright, So i got the working regex for exactly extracting what i need

/(?<=var ticketData = )(.+]})/

Yet i get this:
Entries exist but they do not contain any JSON data.

Hey @Damian_K!

I tried the regular expression you provided and it gives no result. I tried logging the output and I got null. Need to investigate the regular expression and extract only the data you need.

I got the solution, Function 1:

let data

items[0].json.script.forEach(e => {
    if (e === null) return

    const matches = e.match(/(?:var ticketData = )(.*]})/s)

    if (matches && matches[1]) data = JSON.parse(matches[1]).rows
})

return [{ json: data }]

Function 2:

const newItems = [];
for (const item of items[0].json) {
  newItems.push({json: item});
}
return newItems;