XML Node: XML To JSON - ERROR: Invalid character in entity name

I am using the HTTP Node to fetch the website of a new article.

This website will return to me a HTML page with many items in it which include an array of items that contain a header, image and description of an event.

I am looking to use the XML to json so I can convert the HTML code into json. Unfortunately the website uses invalid characters for the items.

I was wondering if there is a way to auto convert or remove these unsafe characters when passing it to the XML or if there is a node I can use before using the XML that would make the HTML code XML safe.

Additionally, I run into this issue quite often with sometimes articles containing unescaped characters such as ", ',` and specially this character ’ - This one seems to break the javascript syntax highlighter and I cant use regex to filter / replace it out.

how I would replace those characters too?

It looks like your topic is missing some important information. Could you provide the following if applicable.

  • n8n version:
  • Database (default: SQLite):
  • n8n EXECUTIONS_PROCESS setting (default: own, main):
  • Running n8n via (Docker, npm, n8n cloud, desktop app):
  • Operating system:

hello @Bredda

It will be better to provide some samples of the ‘broken’ HTML page and a workflow

Hi, thank you for your fast response. I will give you even a better one. I will share the link to the page so you can test it freely.

example data:
This is where the error suggest the problem is but I dont see it

Error:

Error: Invalid character in entity name
Line: 5
Column: 36
Char:  
    at error (/usr/local/lib/node_modules/n8n/node_modules/sax/lib/sax.js:652:10)
    at strictFail (/usr/local/lib/node_modules/n8n/node_modules/sax/lib/sax.js:678:7)
    at SAXParser.write (/usr/local/lib/node_modules/n8n/node_modules/sax/lib/sax.js:1499:13)
    at Parser.exports.Parser.Parser.parseString (/usr/local/lib/node_modules/n8n/node_modules/xml2js/lib/parser.js:327:31)
    at Parser.parseString (/usr/local/lib/node_modules/n8n/node_modules/xml2js/lib/parser.js:5:59)
    at /usr/local/lib/node_modules/n8n/node_modules/xml2js/lib/parser.js:342:24
    at new Promise (<anonymous>)
    at Parser.exports.Parser.Parser.parseStringPromise (/usr/local/lib/node_modules/n8n/node_modules/xml2js/lib/parser.js:340:14)
    at Parser.parseStringPromise (/usr/local/lib/node_modules/n8n/node_modules/xml2js/lib/parser.js:5:59)
    at Object.execute (/usr/local/lib/node_modules/n8n/node_modules/n8n-nodes-base/dist/nodes/Xml/Xml.node.js:234:47)

Wasnt sure if I clicked on reply to you so here I am replying to make sure :sweat_smile:

sorry if I bothered you

But why are you trying to validate HTML as XML? It’s not the same

I suppose you need something like this

well the XML conveniently converted the html into json so I used it for that. It works well for some website and does work well with others.

I have managed to use the HTML node to break down the website I am looking for to grab the items I want.

Unfortunately this method is rather website specific so and I wish to have a more universal solution so I can actually scale this to more than 1 website.

There is no universal solution :slight_smile:

That site you have provided before does not return the content as XML. Some sites can return the content as XML if you provide the Header Accept: application/xml, but it depends on the site.

I do not see an easy way how to convert any site to XML, as it basically will have the same issues as working with HTML directly.

XML works well with structured data (like tables or lists of something), but HTML is a mess.

I see, I was hoping there would be some sort of easy parser that would deal with this.

Anyways. Thank you for the help!

1 Like