Unable to generates correct pagination

Rafay_Saleem · June 24, 2025, 6:58am

I am unable to generate the correct order of pagination. The baseurl pagination generates the correct pages in input but in output it still uses the baseurl 1 pages.

My goal was to get Baseurl 1, get agents’ profiles until they’re done, and then move to Baseurl 2 and repeat the same process. Can anayone help me?

jabbson · June 24, 2025, 3:44pm

Hey @Rafay_Saleem hope all is well. See the workflow below, I think it is pretty close to what you were looking for, only simplified.

You can also see the result of running this in your document under TEST2 tab. Hope it helps.

jabbson · June 24, 2025, 5:18pm

There was a little flaw, here is an updated version

Rafay_Saleem · June 25, 2025, 4:02am

@jabbson Thank you soo much for your help. I have a little question, what if I want to add a wait after each base url complete so it doesn’t block the request?

jabbson · June 25, 2025, 4:07am

You can space them out with “interval between request” within the same pagination option in the same HTTP Request, but that would be for all requests.

Alternatively you can use batching with intervals, where you can wait for x ms after every y requests.

Rafay_Saleem · June 25, 2025, 4:39am

@jabbson Thanks for your help. much appreciated

Rafay_Saleem · June 25, 2025, 11:49am

@jabbson Can it is possible to use any open-source web scraper with it? since I am self-hosting n8n so I am curious if I can use any web scraper api open soruce instead of native http request

jabbson · June 25, 2025, 2:32pm

You can take a look choose from a list of solutions I recently came across or heard of (I am sure there are more, if you search):

Cheerio + HTTP Request Node (Built-in)
Puppeteer (via Docker or External API)
Scrapy (Python-based)
Playwright (via external API)
Use services like Browserless or ScrapingBee
Open-Source APIs with Web Scraping Functionality
Simple Scraper
Go-Scraper or ScrapFly

Rafay_Saleem · June 25, 2025, 3:55pm

@jabbson can you provide cheerio or puppeteer tutorial? I’m unable to find it

jabbson · June 25, 2025, 4:14pm

This is what comes up in quick google search:

ask YT too, there must be some guidance or step-by-steps…

Rafay_Saleem · June 26, 2025, 12:05pm

thanks @jabbson

Rafay_Saleem · June 28, 2025, 9:15am

Hello @jabbson I installed puppeteer node but when scraping websites like realtor.com or zillow, I encounter this 429 code error.
I’m using a self-hosted n8n setup with Docker and Portainer, and it’s cloud-hosted on Oracle and puppeteer n8n node: GitHub - drudge/n8n-nodes-puppeteer: n8n node for browser automation using Puppeteer

Thank you!

jabbson · June 28, 2025, 3:06pm

429 is a rate limiting response code, try to space your requests out in time.

Rafay_Saleem · June 28, 2025, 4:16pm

@jabbson I did, and I also checked by going through the realtor’s website, and it was working fine in the browser.

jabbson · June 28, 2025, 4:29pm

Well, you see people who run these services like their data and they like when you can’t get it, at least easily. While one group of smart people is thinking about how to scrape all the data and make it available and make money off of it, the other group of people is thinking of how to protect themselves from this happening. Bot detection is getting as sophisticated as web scraping and it is a never ending battle.

While on this topic, both services you’ve mentioned strictly prohibit data scraping and the use of automated tools to access or extract data from their platforms without explicit written permission. Doing so is unethical and can bear legal consequences.

Rafay_Saleem · June 28, 2025, 5:00pm

But realtor provides api as well and I think it’s costly @jabbson

jabbson · June 28, 2025, 5:08pm

And that is exactly the point - if you want to have the data - they want you to pay for it, and this is exactly why they will try their best to detect and stop any bot activity on their resources.