Extract HTML Content node is removing spaces

I am pulling data from html using the extract html node. here is an example of the html I am pulling
image

I find the selector and set the key, then set it to text. at the bottom I select clean up text
here is what the output gives me:
image

The node is removing spaces between words…it happens about every 10 words or so.

I thought maybe there were line breaks in there I couldn’t see and it was removing them for some reason but not putting a space to replace them, but here is the raw input, no line breaks or anything of that nature

image

here is the nodes giving me the issues

  • 1.54.4
  • ubunutu
  • docker
  • digital ocean

It looks like your topic is missing some important information. Could you provide the following if applicable.

  • n8n version:
  • Database (default: SQLite):
  • n8n EXECUTIONS_PROCESS setting (default: own, main):
  • Running n8n via (Docker, npm, n8n cloud, desktop app):
  • Operating system:

@djjace , here’s the meaning of utilizing “Clean Up Text”

That is,

Whether to remove leading and trailing whitespaces, line breaks (newlines) and condense multiple consecutive whitespaces into a single space

If you look into the actual HTML code (without having “Clean Up Text” on) you will see that there is a “new line” in the text, \n. Removing that character produces two words connected together

To fix, that is to use a space in place of a new line, you can apply a bit of RegEx, for example, as shown below.

In other words, you do a manual clean up in place of automatic that comes with the option “Clean Up Text”.

More specifically:

  1. Turn off “Clean Up Text”
  2. Apply RegEx in the form {{ $json.about.replaceAll(/\n\n/gm, '^').replaceAll(/\n/gm, ' ').replaceAll('^', '\n') }}

As a result, \n\n is replaced with a “new line” (single \n) while a single (standing alone) “new line” is replaced with a space.

This is just an example. Apply your own logic that suits you best.

1 Like

aaahhhhhh

gotcha. that makes sense!

thanks!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.