Extract html adding body and head

Valdri · January 8, 2024, 7:03pm

Hi there,
I want to extract data from a table on a website using n8n.
I first download the content of the page using browserless and then use html extract to extract the data using these settings:

Then from the output of that node I want to extract every td element

But it only returns an empty array

For testing I used “*” in html2 node CSS selector and this was the output:

<head></head><body>239188988\n                            08.01.2024 17:00\n                            5\n                            DAGRA\n                            Przekazano do urzędu pocztowego (WYSLANY)\n                            \n                                <a href="ajax/ajax_order_packages.php?courier=pocztapolska&amp;get_envelope_labels=239188988" target="_blank">Etykiet</a> &nbsp;|&nbsp; \n                                <a href="ajax/ajax_order_packages.php?courier=pocztapolska&amp;get_envelope_book=239188988" target="_blank">K. nadawcza</a> | \n                                <a href="ajax/ajax_order_packages.php?courier=pocztapolska&amp;get_pocztafirmowa_book=239188988" target="_blank">Poczta firmowa</a>\n                            </body>

I don’t know why body or head suddenly appeared in the output and there were no td elements.

Information on your n8n setup

n8n version: 1.16.0

n8n · January 8, 2024, 7:03pm

It looks like your topic is missing some important information. Could you provide the following if applicable.

n8n version:
Database (default: SQLite):
n8n EXECUTIONS_PROCESS setting (default: own, main):
Running n8n via (Docker, npm, n8n cloud, desktop app):
Operating system:

Jon · January 9, 2024, 1:32pm

Hey @Valdri,

Can you share the site you are trying to download the table from? When it comes to working with HTML the package we use expects complete valid html to work properly which could be why it is adding the missing attributes and also why the TD extract is not returning anything.

system · April 8, 2024, 1:32pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.