I’m trying to scrape a website. I can get the content by using curl with compressed parameters, but when I set up an HttpRequest node and import curl, it always runs without any results.
curl 'https://www.kobo.com/' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:107.0) Gecko/20100101 Firefox/107.0' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8' -H 'Accept-Language: zh-TW,zh;q=0.8,en-US;q=0.5,en;q=0.3' -H 'Accept-Encoding: gzip, deflate, br' --compressed
Please share the workflow
Information on your n8n setup
n8n version: 0.200.1
Database you’re using (default: SQLite): SQLite
Running n8n with the execution process [own(default), main]: own
Running n8n via [Docker, npm, n8n.cloud, desktop app]: Docker
I just tried out running your workflow on n8n and never got a response from the server when doing so on my web server, but got the expected response when running it on a local n8n instance:
So, it seems the server you are trying to scrape data from is rather picky about the endpoints from which it accepts requests. Is there a chance you ran your curl command locally but have your n8n instance running on a remote server?
So this is by no means a recommendation for a specific product, but I have used Webshare proxies in the past to scrape certain websites which have geo-blocking in place. This worked reasonably well with n8n using a value such as socks5://$username:[email protected]:8780 in the “Proxy” field of the HTTP Request node. I can’t promise this will work for you, but it might be worth a shot.