JSON / Javascript Question for Scraping

I am doing am scraping a website which is a page consisting of multiple listings. Each listing has a title, URL and type.

However, there’s no way to match each listing’s title, URL and type. Each of these 3 properties is scraped as a separate array - for example:

[ 
{ 
"title": [ 
"",
"Widget 1",
"Widget 2",
"Mega Widget"
], 

"URL": [ 
"https://dud.url",
"https://www.website.com/1",
"https://www.website.com/2",
"https://www.website.com/mega"
], 

"type": [ 
"Classified",
"Auction",
"Classified"
] 
} 
] 

There’s all a dud item in the arrays for title and URL which I need to remove. It is always the first item in the array. So I’m looking for help on 2 transformations:

  1. How do I remove the first item of the title and URL arrays? I couldn’t get .shift() to work.

  2. How do I arrange the data per item rather than per property? My desired output format is:

[
"object 1":
{ 
title:
url:
type
},

"object 2":
{ 
title:
url:
type
},
"object 3":
{ 
title: 
url: 
type: 
}
]
  1. If your array is called URL try this
URL = URL.splice(1);

slice will return array from element with index 1 (mean second element bc we start from 0) till end (unless given second parameter)
Read more here as needed: Array.prototype.splice() - JavaScript | MDN

  1. I figure it out with looping:
{
  "nodes": [
    {
      "parameters": {
        "functionCode": "return [ \n  {\n    \"json\": {\n    \"title\": [ \n      \"\",\n      \"Widget 1\",\n      \"Widget 2\",\n      \"Mega Widget\"\n    ], \n    \"URL\": [ \n      \"https://dud.url\",\n      \"https://www.website.com/1\",\n      \"https://www.website.com/2\",\n      \"https://www.website.com/mega\"\n    ], \n    \"type\": [ \n      \"Classified\",\n      \"Auction\",\n      \"Classified\"\n    ] \n    }\n  }\n] "
      },
      "name": "Make data",
      "type": "n8n-nodes-base.function",
      "typeVersion": 1,
      "position": [
        -2880,
        1360
      ]
    },
    {
      "parameters": {
        "functionCode": "// items[0].json.URL = items[0].json.URL.slice(1)\n// items[0].json.title = items[0].json.title.slice(1)\n\nitems[0].json.title.shift()\nitems[0].json.URL.shift()\n\nreturn items;"
      },
      "name": "Remove 1st title and URL",
      "type": "n8n-nodes-base.function",
      "typeVersion": 1,
      "position": [
        -2660,
        1360
      ]
    },
    {
      "parameters": {
        "functionCode": "objects = []\n\nfor (i =0; i<items[0].json.title.length; i++) {\n  objects.push(\n    {\n      \"json\": {\n        title: items[0].json.title[i],\n        URL: items[0].json.URL[i],\n        type: items[0].json.type[i]\n      }\n    }\n  )\n  console.log(objects)\n}\nreturn objects"
      },
      "name": "Change data structure",
      "type": "n8n-nodes-base.function",
      "typeVersion": 1,
      "position": [
        -2440,
        1360
      ]
    }
  ],
  "connections": {
    "Make data": {
      "main": [
        [
          {
            "node": "Remove 1st title and URL",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Remove 1st title and URL": {
      "main": [
        [
          {
            "node": "Change data structure",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

Have a look at this and don’t hesitate to paste your JSON if you need help with any step :slight_smile:

1 Like

Thanks Simon :slight_smile:

  1. Slice works! I want to understand why splice worked and shift did not – is that because slice returns an array and shift does not? (And n8n needs an array to be returned to shift is not going to work in this case).

  2. I am trying to achieve the same result as your loop using the map function. I can create individual items with a json property, but each item contains the full array of titles (not just 1). Would love to hear your pointers for the code below:

return items[0].json.title.map( item => ( { json : { 'title': items[0].json.title} } ));
  1. I didn’t know much about shift but after research, both will work. Shift does not return value so you need to just call method. I updated my prev. post but it would be like that
items[0].json.title.shift()
items[0].json.URL.shift()

Docs: Array.prototype.shift() - JavaScript | MDN

  1. Ok, so map can make variable for iterator (here more) so this is my solution.
{
  "nodes": [
    {
      "parameters": {
        "functionCode": "return [ \n  {\n    \"json\": {\n    \"title\": [ \n      \"\",\n      \"Widget 1\",\n      \"Widget 2\",\n      \"Mega Widget\"\n    ], \n    \"URL\": [ \n      \"https://dud.url\",\n      \"https://www.website.com/1\",\n      \"https://www.website.com/2\",\n      \"https://www.website.com/mega\"\n    ], \n    \"type\": [ \n      \"Classified\",\n      \"Auction\",\n      \"Classified\"\n    ] \n    }\n  }\n] "
      },
      "name": "Make data",
      "type": "n8n-nodes-base.function",
      "typeVersion": 1,
      "position": [
        -2880,
        1360
      ]
    },
    {
      "parameters": {
        "functionCode": "// items[0].json.URL = items[0].json.URL.slice(1)\n// items[0].json.title = items[0].json.title.slice(1)\n\nitems[0].json.title.shift()\nitems[0].json.URL.shift()\n\nreturn items;"
      },
      "name": "Remove 1st title and URL",
      "type": "n8n-nodes-base.function",
      "typeVersion": 1,
      "position": [
        -2660,
        1360
      ]
    },
    {
      "parameters": {
        "functionCode": "objects = []\n\nfor (i =0; i<items[0].json.title.length; i++) {\n  objects.push(\n    {\n      \"json\": {\n        title: items[0].json.title[i],\n        URL: items[0].json.URL[i],\n        type: items[0].json.type[i]\n      }\n    }\n  )\n  console.log(objects)\n}\nreturn objects"
      },
      "name": "Change data structure",
      "type": "n8n-nodes-base.function",
      "typeVersion": 1,
      "position": [
        -2400,
        1240
      ]
    },
    {
      "parameters": {
        "functionCode": "objs = []\n\nitems[0].json.title.map((item, i) => objs.push(\n  {\n    \"json\": {\n      \"title\": item,\n      \"URL\": items[0].json.URL[i],\n      \"type\": items[0].json.type[i]\n    }\n  }\n))\n\nreturn objs"
      },
      "name": "Change structure with map()",
      "type": "n8n-nodes-base.function",
      "typeVersion": 1,
      "position": [
        -2400,
        1440
      ]
    }
  ],
  "connections": {
    "Make data": {
      "main": [
        [
          {
            "node": "Remove 1st title and URL",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Remove 1st title and URL": {
      "main": [
        [
          {
            "node": "Change data structure",
            "type": "main",
            "index": 0
          },
          {
            "node": "Change structure with map()",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

Let me know if you have any more question what I can help with :slight_smile:

3 Likes

Thanks so much Simon. You are correct about shift() working as well.

Your solution with map also works :raised_hands: I was playing around a bit more to learn myself and found this works too:

const data = items[0].json.title.map((item, i) => (
  {
    "json": {
      "title": item,
      "URL": items[0].json.URL[i],
      "type": items[0].json.type[i]
    }
  }
))
2 Likes

Happy to help <3 Have fun!

2 Likes