JSON / Javascript Question for Scraping

I am doing am scraping a website which is a page consisting of multiple listings. Each listing has a title, URL and type.

However, there’s no way to match each listing’s title, URL and type. Each of these 3 properties is scraped as a separate array - for example:

[ 
{ 
"title": [ 
"",
"Widget 1",
"Widget 2",
"Mega Widget"
], 

"URL": [ 
"https://dud.url",
"https://www.website.com/1",
"https://www.website.com/2",
"https://www.website.com/mega"
], 

"type": [ 
"Classified",
"Auction",
"Classified"
] 
} 
] 

There’s all a dud item in the arrays for title and URL which I need to remove. It is always the first item in the array. So I’m looking for help on 2 transformations:

  1. How do I remove the first item of the title and URL arrays? I couldn’t get .shift() to work.

  2. How do I arrange the data per item rather than per property? My desired output format is:

[
"object 1":
{ 
title:
url:
type
},

"object 2":
{ 
title:
url:
type
},
"object 3":
{ 
title: 
url: 
type: 
}
]
  1. If your array is called URL try this
URL = URL.splice(1);

slice will return array from element with index 1 (mean second element bc we start from 0) till end (unless given second parameter)
Read more here as needed: Array.prototype.splice() - JavaScript | MDN

  1. I figure it out with looping:

Have a look at this and don’t hesitate to paste your JSON if you need help with any step :slight_smile:

1 Like

Thanks Simon :slight_smile:

  1. Slice works! I want to understand why splice worked and shift did not – is that because slice returns an array and shift does not? (And n8n needs an array to be returned to shift is not going to work in this case).

  2. I am trying to achieve the same result as your loop using the map function. I can create individual items with a json property, but each item contains the full array of titles (not just 1). Would love to hear your pointers for the code below:

return items[0].json.title.map( item => ( { json : { 'title': items[0].json.title} } ));
  1. I didn’t know much about shift but after research, both will work. Shift does not return value so you need to just call method. I updated my prev. post but it would be like that
items[0].json.title.shift()
items[0].json.URL.shift()

Docs: Array.prototype.shift() - JavaScript | MDN

  1. Ok, so map can make variable for iterator (here more) so this is my solution.

Let me know if you have any more question what I can help with :slight_smile:

3 Likes

Thanks so much Simon. You are correct about shift() working as well.

Your solution with map also works :raised_hands: I was playing around a bit more to learn myself and found this works too:

const data = items[0].json.title.map((item, i) => (
  {
    "json": {
      "title": item,
      "URL": items[0].json.URL[i],
      "type": items[0].json.type[i]
    }
  }
))
2 Likes

Happy to help <3 Have fun!

2 Likes