Get rid of the HTML weird tags in a RSS text

Hello ! I’m a new user of n8n and really fan of it. Thanks for this amazing software.

I created this workflow to send new RSS items to a Discord channel, but the text keeps its weird HTML tags from the RSS, such as L&#39 for L’, or strong tags .

How could I get a nice text ?

I call the help of the amazing @Jon on this one :grin: :sweat_smile: :star_struck:

Hi! I’m not the amazing Jon, but here’s one possible solution:

After the RSS node, add a Set node, which uses some regex to clean up the text.

For example, this expression removes the HTML tags, and replaces ' with ' in the title.

{{ $json.title.replace(/(<([^>]+)>)/gmi, "").replace("&#39;", "'") }}

Your workflow with an example Set node:

4 Likes

Thanks a lot amazing @deborah ! You rock :metal:

2 Likes

@comedepreville there might be an even easier way by using one of our helper functions: The removeTags function might be just what you’re searching for :slight_smile:

Thanks @Niklas_Hatje. I used the removeTags() function to get rid of the regex. However, I didn’t find any helper function to convert those weird expressions : &#39; or &quot; into their normal ones : ' and ". I still have to use the replaceAll() function. If you have any idea of an easier way to do it, it would be amazing !

Yeah, you’re right… there’s unfortunately not a function for this yet

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.