How to get the main image url from a web article using n8n?

Hello everyone, the n8n team and the amazing @Jon.
I still love using n8n and am trying to improve my knowledge of this amazing software. Here is my problem of the day:

I have a feed fed by a community node (the RSS trigger) that sends web news to Discord and a Notion database, along with their urls. However, I would like to get the url of the main image of the news web page. For example, Discord does this automatically for most urls that are in the posts:

I guess I should use an HTML extract node to get an element of the article’s meta, but I don’t know which meta tag, and I can’t get an extract of the page. Or maybe this is a bad way to do it?

Thanks for your answers !
Here is an example of the json that I get after the RSS trigger node and the set node :

[
  {
    "title": "VIDÉO. Première navigation du For People de Thomas Ruyant : « L'impression de barrer un ...",
    "contentSnippet": "C'est sur ce plan Antoine Koch – Finot Conq que le skipper nordiste participera aux courses du circuit Imoca avec en ligne de mire le prochain Vendée ...",
    "link": "https://www.francelive.fr/teaser/voiles-et-voiliers/video-premiere-navigation-du-for-people-de-thomas-ruyant-limpression-de-barrer-un-deriveur-7970994/",
    "pubDate": "2023-04-14T09:59:01.000Z",
  }
]

Hi @comedepreville

You would need to use the HTTP request to call that page and then extract the image from the HTML that is returned.
This can however be different for all pages, so you would need to mess around with a few to see what the best solution is.
Not sure how something like Discord and such decide on what image to take here. You might be able to find out.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.