Extract HTML Content node is removing spaces

djjace · August 25, 2024, 2:24am

I am pulling data from html using the extract html node. here is an example of the html I am pulling

I find the selector and set the key, then set it to text. at the bottom I select clean up text
here is what the output gives me:

The node is removing spaces between words…it happens about every 10 words or so.

I thought maybe there were line breaks in there I couldn’t see and it was removing them for some reason but not putting a space to replace them, but here is the raw input, no line breaks or anything of that nature

here is the nodes giving me the issues

1.54.4
ubunutu
docker
digital ocean

n8n · August 25, 2024, 2:24am

It looks like your topic is missing some important information. Could you provide the following if applicable.

n8n version:
Database (default: SQLite):
n8n EXECUTIONS_PROCESS setting (default: own, main):
Running n8n via (Docker, npm, n8n cloud, desktop app):
Operating system:

ihortom · August 25, 2024, 5:59pm

@djjace , here’s the meaning of utilizing “Clean Up Text”

That is,

Whether to remove leading and trailing whitespaces, line breaks (newlines) and condense multiple consecutive whitespaces into a single space

If you look into the actual HTML code (without having “Clean Up Text” on) you will see that there is a “new line” in the text, \n. Removing that character produces two words connected together

To fix, that is to use a space in place of a new line, you can apply a bit of RegEx, for example, as shown below.

In other words, you do a manual clean up in place of automatic that comes with the option “Clean Up Text”.

More specifically:

Turn off “Clean Up Text”
Apply RegEx in the form {{ $json.about.replaceAll(/\n\n/gm, '^').replaceAll(/\n/gm, ' ').replaceAll('^', '\n') }}

As a result, \n\n is replaced with a “new line” (single \n) while a single (standing alone) “new line” is replaced with a space.

This is just an example. Apply your own logic that suits you best.

djjace · August 26, 2024, 7:38pm

aaahhhhhh

gotcha. that makes sense!

thanks!

system · September 2, 2024, 7:38pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.