N8n - Puppeteer - leboncoin.fr

Morganorix · December 12, 2024, 9:11pm

Hello all,
I’m new to n8n. I’m trying to make a request on the leboncoin.fr website.
I’m stuck on a dynamic captcha on the site. I installed puppeteer on n8n and after several unsuccessful attempts it still doesn’t work. Do you have any ideas on how to do this? Have other people succeeded on a similar subject?

For the more curious, my goal is to do several filtered searches to find a house to buy in different areas, to retrieve the links according to these criteria on a google sheet for example or to send it to me in the form of a newsletter.

Kool_Baudrillard · December 15, 2024, 8:54pm

Hi,

you can try something like this:

Yor directory is based on Next.js, within the document Next.js stores a json, which is than populated

Here you just use the selector from your dev tools in the html node and it returns the json

Some regex and a small script in code node to split out the results

You will latest get blocked here due to datadome protection. If you do a lot of scraping, datadome is a bigger pain in the a** than Cloudflare. You could use a service like ZenRow, BrightData, etc which offers bypassing of datadome, etc.

Usually using puppeteer (even in stealth mode) isn´t sufficient due to some checks services like data dome do (e.g. are there any audio devices, fingerprinting etc)

Morganorix · December 20, 2024, 8:48am

Nice thx !!

Morganorix · December 21, 2024, 4:38pm

When i try with HTTP Request, i’ve an error message :

Forbidden - perhaps check your credentials?

403 - "{\"url\":\"https://geo.captcha-delivery.com/captcha/?initialCid=AHrlqAAAAAMADKGt1sOlozgAUkLfgQ==&cid=2HJNRrkLLfpjG6wpmuLYbYlptwJ7e27xZxjRc9j7KETSm8q8doU7m4Is0TRipnh9nrkXTKr6k1U1oj5Twg3Kh5_D5h0XmzSMWdX7SPZDl2VxbIq_GBRkqXIyv9sngTFM&referer=http%3A%2F%2Fwww.leboncoin.fr%2Frecherche%3Fcategory%3D9%26locations%3DSaint-%2525C3%252589loy-les-Mines_63700__46.16245_2.83362_4261_20000%26real_estate_type%3D1%26immo_sell_type%3Dnew%252Cold%26outside_access%3Dgarden%26price%3Dmin-100000%26land_plot_surface%3D1000-max%26owner_type%3Dall%26sort%3Drelevance&hash=05B30BD9055986BD2EE8F5A199D973&t=fe&s=2089&e=f20002bd6ea292bb193c9f4334db00d3383f72301b33cfffa3055b902259be80\"}"

Kool_Baudrillard · December 21, 2024, 6:52pm

Could you post your workflow here? How are you running N8N?

Morganorix · December 22, 2024, 5:30pm

I did some more tests and organized my variables better.

The puppeteer config:

leonardogrig · December 23, 2024, 2:33am

These type of scraping integrations might work for some servers and not in others. Digital Ocean might work for a while, then need a proxy while AWS could simply not work. If it’s something you can spend a bit more on, try firecrawl!

Morganorix · December 23, 2024, 6:33pm

i’ve tested and it is blocked by leboncoin.fr (geo-captcha). Not easy lol

Morganorix · December 25, 2024, 11:19pm

[EN]
I’ve made good progress.
I manage to get to the site once and then I’m blocked. I understand that the captcha blocks the IP for a few hours.
As I need to go there once a day, that’s fine.
There’s a detection on the User-Agent. I created a docker instance with browserless then I put it in Browser WebSocket Endpoint in Puppeteer

[FR]
J’ai bien avancé.
J’arrive à me rendre sur le site une fois puis je suis bloqué. J’ai compris que le captcha bloquait l’IP pendant quelques heures.
Comme j’ai besoin de m’y rendre une fois par jour ca me va.
Il y a une detection sur le User-Agent.
J’ai également créé une instance docker avec browserless puis je l’ai mis dans Browser WebSocket Endpoint de Puppeteer

system · March 25, 2025, 11:20pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.