RSS 403 Openai.com/news, n8n cloud

Hi folks,

I’m trying to get the rss from openai.com/news and scrape their articles using the http node but both of them return 403. It works via browser, postman and even Make.com but n8n is being refused.

I’ve tinkered with setting headers and tried to implement a proxy and miserably failed doing so. It was only after that I realised it worked for Make.com and that got me thinking, shouldn’t it work without a proxy for n8n cloud as well?

Any pointers or help would be much appreciated!

Information on your n8n setup

  • n8n version: [email protected]
  • Database (default: SQLite): default, I guess?
  • n8n EXECUTIONS_PROCESS setting (default: own, main): no idea
  • Running n8n via (Docker, npm, n8n cloud, desktop app): n8n cloud
  • Operating system: probably

It looks like they have some kind of scrape shield installed (which doesn’t make sense for an RSS page imho). I can access that page from my browser like you, but not from our cloud instance. I’ll ask our engineers if they have any ideas.

Yeah it’s behind a CloudFlare shield that blocks scraping. If you’re in the ‘wrong’ IP range, it’ll block cUrl commands too. There’s nothing we can do about it, sorry.

2 Likes

Thanks for the reply.

Agreed, very strange to have a non scrapeable RSS-feed! I got my proxy solution working to get around the issue.

A bit ironic how OpenAI of all sites is being so anti bots :smiley:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.