I’m trying to get the rss from openai.com/news and scrape their articles using the http node but both of them return 403. It works via browser, postman and even Make.com but n8n is being refused.
I’ve tinkered with setting headers and tried to implement a proxy and miserably failed doing so. It was only after that I realised it worked for Make.com and that got me thinking, shouldn’t it work without a proxy for n8n cloud as well?
It looks like they have some kind of scrape shield installed (which doesn’t make sense for an RSS page imho). I can access that page from my browser like you, but not from our cloud instance. I’ll ask our engineers if they have any ideas.
Yeah it’s behind a CloudFlare shield that blocks scraping. If you’re in the ‘wrong’ IP range, it’ll block cUrl commands too. There’s nothing we can do about it, sorry.