Does the 'Extract HTML content'-node support advanced CSS selectors?

Hello,

I’m trying to bring in data from a website using the ‘Extract HTML Content’-node.

I’m using a CSS selector that selects a data-attribute using:
.class[data-list=‘XXXX’]

This doesn’t seem to result in any data.

I’m forced to work like this as the website uses generic classes that repeat themselves multiple times.

Anyone have experience with this? Any workarounds?

Hey @LanderC hope all is good!

Would you like to share the website and what you are trying to scrape?

1 Like

Sure Jabbson!

It’s a Dutch news website. They have a dedicated section for ‘most read’ news articles:
→ hln.be

The section with title ‘Meest gelezen’ uses generic classes found on different parts of the homepage.

This CSS selector should work:
.col–secondary .fjs-sticky-container .widget__content .ankeiler__link[data-list=‘meest gelezen’]

But it doesn’t in n8n somehow.

Thx for the response!

It appears the website have a firewall which kicks in after several attempts of getting the page. Is this something you came across already?

Forbidden - perhaps check your credentials?
Access Denied Access Denied
Your request was blocked by DPG Media's Web Application Firewall.

Nonetheless, if you are after the articles in the Meest gelezen, the selector you are looking for is simply

a[data-list='meest gelezen']

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.