@Asim_Arman did a quick check on ur 3 sources before recommending anything and theres some good news for one of them — HouseCall Pro isnt actually on Cloudflare and its /feed/ works fine if u just add a real browser User-Agent header to the HTTP Request node. tested with a Mozilla/5.0 UA against Housecall Pro and got valid RSS XML back. ur 403 there is almost certainly because n8n’s HTTP Request node defaults to a generic UA (or none at all) and the site has a basic UA filter. switch to RSS Feed Read pointed at Housecall Pro with a User-Agent override and ur done for that one, no scraping needed.
Jobber (getjobber.com) is genuinely on Cloudflare — cf-ray header present in responses — and theres no RSS for the academy section. HVAC Informed also doesnt expose RSS. those two genuinely need a scraping service.
on the bypass-cloudflare-natively question, no real way to do it in n8n. their challenge requires executing JS to solve proof-of-work plus behavioral fingerprinting, and the HTTP Request node is a plain http client with no browser engine. the cloudscraper-style libs from 2022-2023 are all broken against current Cloudflare too, so the few community nodes wrapping them dont work either.
browser-like headers fix basic UA filters (which is what HouseCall Pro’s 403 actually was, ironically) but do nothing against the real Cloudflare challenge — they fingerprint TLS, JA3, request timing, and JS execution beyond just headers.
for ScrapingBee vs Apify vs BrightData — for ur use case (low-volume daily digest, just 2 sites needing real scraping) ScrapingBee or Firecrawl is the cleanest. rough comparison:
{
"scrapingbee": {"per_request": "~$0.001-0.005", "fit": "low-mid volume, simple, fast"},
"scraperapi": {"per_request": "~$0.001-0.003", "fit": "comparable to scrapingbee"},
"firecrawl": {"per_request": "~$0.001", "fit": "scrape-to-markdown, LLM-ready output"},
"brightdata": {"per_request": "~$0.005-0.02", "fit": "hardest targets, expensive but most reliable"},
"apify": {"per_request": "varies", "fit": "complex multi-page flows"}
}
ScrapingBee called via HTTP Request looks like:
{
"method": "GET",
"url": "https://app.scrapingbee.com/api/v1/",
"qs": {
"api_key": "<your_key>",
"url": "https://www.getjobber.com/academy/",
"render_js": "true",
"premium_proxy": "true"
}
}
end state is one RSS Feed Read for HouseCall Pro, two ScrapingBee HTTP calls for Jobber and HVAC Informed, then parse → digest. way cheaper than treating all 3 as Cloudflare-protected when only one actually is.