Hi there, I’m trying to capture an array of values from an HTML document. The document contains malformed HTML so I can’t use the HTML extract node.
I’ve got the regex expression figured out:
(?<="a-size-base review-text review-text-content"> <span> )(.*?)(?=<\/span>)
But my function node is not working with this:
let source = $node["Get HTML"].json["data"];
let ReviewText = RegExp(/(?<="a-size-base review-text review-text-content"> <span> )(.*?)(?=<\/span>)/).exec(source);
return [{json:{ReviewText}}]
I keep getting null as the result. A small part of the document is this:
{"pageNumber":"1","reviewerType":"avp_only_reviews"}" class="a-link-normal" href=</a></span></div><div class="a-row a-spacing-small review-data"><span data-hook="review-body" class="a-size-base review-text review-text-content"> <span> blah blah blah blah blah </span> </span></div><div class="a-row review-comments comments-for-R1AS0L0P8AKBI1"><div data-reftag="cm_cr_arp_d_cmt_opn" aria-live="polite" data-a-expander-name="review_comment_expander" class="a-row a-expander-container a-expander-inline-container cr-vote-action-bar"><span class="cr-vote" data-hook="review-voting-widget"> <div class="cr-helpful-button aok-float-left"> <span class="a-button a-button-base"><span class="a-button-inner"><
Any chance of getting a core dedicated Regex Node to let us more easily fine tune the inputs and expected outputs without needing the function node?