HTTP request - Rejected (cloudflared?)

Josh-Ghazi · August 30, 2022, 7:34am

After completing my giant pdf downloader workflow, which includes n8n + python + selenium grid to get PDFs from a range of sites. I have run into a problem where it seems like I have been blocked or falsely accused of ddos attacking?

The workflow uses selenium to navigate to the website and input the search parameters and return the download url,then i pass this url to n8n http request to pick up the file, rename and file it away in the appropriate network folder. i do it this way because Im lazy to convert the file into a binary and send it back in the json file. The selenium browser seems to have no problem with getting blocked however it is the n8n node that is rejected after a number of runs

It happens when the http request is made too often. (about 35 consecutive runs)

Is there anyway I can circumvent this from happening?

Will swapping user agents be sufficient or will i require a new IP address by vpn? In the past I have swapped to google bot user agent, which has helped a temporary block, but i think even that is not enough because it did get blocked after a big number of runs.

Describe the issue/error/question

What is the error message (if any)?

{“status”:“rejected”,“reason”:{“message”:“connect ECONNREFUSED 127.0.0.1:80”,“name”:“Error”,“stack”:“Error: connect ECONNREFUSED 127.0.0.1:80\n at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1247:16)”,“code”:“ECONNREFUSED”}}

Please share the workflow

Share the output returned by the last node

Information on your n8n setup

runnning latest version on Ubuntu linux 22.04 in a docker container
Default database sqlite

marcus · August 30, 2022, 8:25am

Hey Josh,
there is no easy answer for your scenario as it depends on the rate limiting strategies of the websites you are making http requests again.

This could be an indicator that using a user-agent to make it look like your using a browser could help. Not sure though.

You could try using the Wait node in between requests for throttlling. Most of the time websites use a rate limit of X requests per per minute.

I don’t know I’ve seen other webscraping tools switching IPs to circumvent being blocked by IP but never used it myself.

Sorry if I couldn’t be of much help.

Josh-Ghazi · August 30, 2022, 10:34am

Thank you for your response!

Yes its a complicated situation, So far I’ve asked the staff to split up the jobs to about 40 per file and added slack notification to tell them that the process has started and to not try again until its finished.

but it would be nice if someone with some solid experience to provide some input.

We may need to look at VPN to change the IP but that also will be hard because most of the VPNS ive used already have IPS that have been flagged from the get go, if you go on google they ask you to perform recaptchas