How to remove html tags from rss feed or from the text output

Hi to all people here.

I am not a programmer and not an native english speaker but i will do my best to explain:

(IN SHORT) I need to remove html tags from a text - : <a href <p …and so one

LONG DESCRIPTION) I retrieve the feed from an Rss agregator. I extract the content with n8n and i translate it with deepl in order to post the content to a blog.
The result of the feed extraction contain only HTML code with all the formating, achors so on. Also the output / translated content of the Deepl API is also full of HTML tags .
I need to have only the formatted text or at least ( but not so good) the plain text.

I tried almost every solution in n8n.
I am only a casual coder in C# but i know how to parse the html text and to get ride of the unvanted html tags.
Java script i do not know , as i saw that is used in the n8n. But maibe i could implement a JS code.
And directions or sugestions please
Thank you
Daniel

I run desktop app of N8n
no errors, just output text is not clean, but full of original anchors ant nasty tags
I tried also HTML EXTRACT node, but maybe i was something wrong -there i have different errors depending on the internal confirgurations .
one is ERROR: No property named " some text here " exists!

Hey @sheiku,
welcome to the community :tada:

I just looked into the rss feed and I see a contentSnippet field that seems to be the content field without any html tags. Is that what you were looking for?

2 Likes

If you need to remove html tags yourself you could use a Set node expression with regex like this.

image

Here is an example workflow to illustrate.

2 Likes

Many thanks, @marcus,
i will try your example
Yes i try to clean the text like in your example, but i have errors or the texts is not cleaned.
I will came back with an answer…

@marcus , it seems that is working your approach !
Wow, It was easier for me (as a non coder) to learn C# than to understand regex…for me regex is unknown field.
I added the deepl translation and i selected the output translated to be only the title and the content fields of the rss… {{$json[“content”]}} {{$json[“title”]}}
and it is working.
I will tweak more the results.
Thank you again

2 Likes