How to Specify Meta Tags in HTML Extract?

Hi everyone! I’m trying to extract Open Graph meta tags from some HTML using the HTML Extract module, but it’s not working. I can extract all other tags, but these meta tags are giving me grief. Is there a special way to parse them? Dot notation works on some like on <div class=""></div> but not on <meta property="" content=""/>

Hopefully there is an easier way with just the HTML Extract node, but I ended up just extracting the content and property attributes as separate arrays and then making key-value pairs out of them using the Function node and this snippet:

var values=$node["HTML Extract"].json["property"]
var props=$node["HTML Extract"].json["content"]

var i;
var currentProp;
var currentVal;

var result = {}


for (i = 0; i < props.length; i++) {
    currentProp = values[i];
    currentVal = props[i];
    result[currentProp] = currentVal;    
}

return [
{json:{result}}
]

The node behind the scenes uses cheeiro.js. So you can use any selector supported by it. Check the example below.

1 Like

Thanks! This is much better - I wasn’t sure how to format the CSS Selector specifically so this helps a lot and is much cleaner I think.

2 Likes

sorry to piggyback your question
but how I could replace meta[property="og:title"] to extract the title with ?