HTML_Extract : [Empty Array]

fischera · November 23, 2022, 7:56pm

Hello n8n tribe,
I’m trying to grasp information from a website (https://www.clinicaltrialsregister.eu/ctr-search/search?query=Schizophrenia+OR+PTSD&country=de)
I’m almost there but during the last extraction node where I’m trying to grasp Sponsor Name and Medical Condition , I am getting an “empty array” error.

Here is my Workflow:

For the node’s configuration, I’ve used the CSS selector as mentioned in this ticket: (HTML_Extract: How to do css selector to get Row1 & Highlighted Row X). But after many attempts, it’s not working.

Do you have any idea ? I’m completely stucked

Thanks,

MutedJam · November 24, 2022, 3:44pm

Hi @fischera, it seems your current workflow is giving you incomplete tables which are especially problematic. From my experience, this causes many selectors around tables (stuff like td) to not work as expected. In addition, the option in your browser would be relative to the entire page, not just your result.

So, perhaps you might want to try an approach where you wrap each result back in a <table>, then use hand-picked selectors for these results?

Like so:

This example flow would return a list like this in the end:

I’ve tried to find the most human-readable selector possible, so this example is using a :contains selector searching for a specific text value in each table cell. This isn’t an official CSS selector afaik, but n8n supports it so I am not asking any further questions here . You should be able to easily utilize this to find additional fields.

Hope this helps! Let me know if you have any questions on this example (though I’ll only be back next week, so might take a bit for me to check back on this)

fischera · November 24, 2022, 5:23pm

Excellent @MutedJam - Thank you so much, this is exactly what I was looking for.
I understand the logic behind it thanks to your explanations.
Just have one question regarding the “Full page” node - Is the 20 you have entered manually an arbitrary value?
Is it a way to transform it so that it checks if there is no page left instead of putting a number?
Again, thank you so much, and have a great time off!
Arnaud

MutedJam · November 28, 2022, 12:10pm

So, the 20 is not an arbitrary value, it’s just how many results a full page would have. It might not be the ideal choice, but I can’t think of any way to use the actual page number here.

But perhaps you can check if there is a next page link if you are having trouble with the results number? Might be a bit more robust:

You could also consider adding a max loop count (for example, checking if {{ $runIndex }} is below 10 or something as an additional safeguard.

system · December 5, 2022, 12:10pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.