Website scraping - help pls

koko · July 21, 2023, 9:12pm

Greetings everyone, I hope you can guide me how I can scrape a website like this:
https://www.theoutnet.com/en-nl/shop/product/sandro/sneakers/fashion-sneakers/perforated-two-tone-leather-high-top-sneakers/1647597321702186

I am using n8n on other websites without issues, but this one just does not load properly when using the http-request node. What is the trick to get this one scraped?

Thank you ahead for sharing!

jan · July 21, 2023, 9:38pm

Welcome to th community @koko!

Did not check that page deeply but the reason will probably be that JavaScript is involved. The HTTP Request node will however just load the HTML of the page and will so not execute the JavaScript. To make it work, you would either have to use an external API which renders everything and then returns the resulting HTML (will be paid) or you will check out some other posts in this forum which talk about using Puppeteer with n8n. But that is for sure some more work, and will require some deeper technical knowledge to get it to work.

koko · July 22, 2023, 8:19pm

Thanks a lot @jan for your response. Even though its not what I was hoping for, you clarified it for me. This is much appreciated. Good community is the key!

mikeon · July 22, 2023, 9:51pm

it is NOT a javascript issue
it has something to do with how this site is secured.
I’ve run a number of tests
Postman from windows - works
Postman online - doesn’t
wget on windows - doesn’t work
wget on windows with user agent - works
curl on windows - doesn’t work
curl on linux - doesn’t work but returns access denied

verbose logs suggest something to do with ssl/tls but don’t have time to dig further.
if you are at least a little techie and it’s a life or death situation for you - describe your problem to chatgpt and move from there step by step.
no way in hell they can secure it 100%

jan · July 23, 2023, 11:57am

Thanks a lot @mikeon !

I guess in this case it should also work with n8n if the User Agent is set.

mikeon · July 23, 2023, 12:20pm

I failed to make it work on Linux with or without user agent. I’m guessing your n8n runs on Linux (docker) and it probably uses curl or similar

koko · July 25, 2023, 9:45pm

@mikeon, thanks for all your efforts, much appreciated! I guess I am going to try the ChatGPT path

Regarding the User Agent suggestion, I did try it before I posted my question here. I did not have much luck with it either. But I will give it another go. Not much I can loose.

koko · July 25, 2023, 10:29pm

@mikeon, how about that. So I managed to get what I need using wget.

/usr/bin/wget -U "<USER AGENT>" "{{ $json.ProductUrl }}" -O file.html 2>/dev/null

After this it was an easy task to read the binary file and extract the content.

Topic		Replies	Views
Scraping Dynamic Website Which uses AJAX to Load Content Questions http-request	8	715	July 3, 2025
Scraping with n8n? Questions node	8	2385	September 23, 2021
Using HTTP Request tool for AI agents to scrape websites Questions ai	2	2396	September 15, 2024
Trying to webscrape using n8n Questions html	1	607	September 21, 2023
[NEW] ScrapeNinja official integration with n8n: web scraping API with rotating proxies and real browser Built with n8n community-node	36	5407	September 30, 2025

Website scraping - help pls

Related topics