Seeking Advice on Self-Hosted Web Scraping and Monitoring with n8n

Henry_Junior · August 14, 2024, 6:40pm

I’m currently running a self-hosted version of n8n and am looking to self-host a solution that can monitor websites and scrape data, which I can then process through n8n.

I had difficulty finding a quick way to set up a self-hosted version of Browserless v2 with native support for n8n. The main community package, n8n-nodes-browserless, appears to be outdated, unsupported, or incompatible. The official Browserless n8n integration seems to rely on using the HTTP Request node, which sounds great, but I’m unsure what advantages I’d gain by using Browserless through HTTP Request compared to just running something like changedetection.io with Browserless and then having notifications sent to n8n.

Since I haven’t used the Browserless integration through HTTP Request, nor have I used Browserless on its own, I naively assumed it might be more convenient for me to setup and manage the web scraping and monitoring in a separate tool, then handle the processing of results in n8n.

For those reasons, I decided to self-host changedetection along with browserless/playwright using a slightly modified version of this Docker Compose setup.

I’ve configured changedetection to send notifications via Apprise, triggering webhooks in n8n with GET requests, and it’s working well.

I have a few questions:

Before I dive deeper into setting up triggers in changedetection to activate workflows in n8n, I’m wondering if I might be overlooking any native support in n8n for Browserless or other similar services that I should consider? And, if so, why or why not?
Additionally, if anyone has tips on getting the most out of this integration, I’d love to hear them—especially if someone has used a similar setup to create something interesting!
Lastly, if there are any self-hosted scraping/notification integrations I might have missed that are worth considering, please let me know! I’m running the latest version of n8n on Linux. I prefer to run n8n on a fairly lean server (that way it’s free on GCP), so I’d like to host any monitoring and scraping on a separate machine unless I am missing out on something super useful.
One other thing I’ve been curious about is whether, since I’ve set up changedetection with Browserless, I can directly call Browserless from n8n using the HTTP node and get the best of both worlds. If that’s possible, how would I go about making that request?

Thanks for reading and any tips!

n8n version: 1.54.2
Database (default: SQLite): SQLite
n8n EXECUTIONS_PROCESS setting (default: own, main): own, main
Running n8n via (Docker, npm, n8n cloud, desktop app): Docker
Operating system: Ubuntu 22.04.4 LTS

n8n · August 14, 2024, 6:40pm

It looks like your topic is missing some important information. Could you provide the following if applicable.

n8n version:
Database (default: SQLite):
n8n EXECUTIONS_PROCESS setting (default: own, main):
Running n8n via (Docker, npm, n8n cloud, desktop app):
Operating system:

Jon · August 19, 2024, 2:20pm

Hey @Henry_Junior,

Welcome to the community

The quick answers are…

From in n8n the only “native” way to work with browserless would be with the HTTP Request node unless you can find a way to get the community node working.
I guess this all depends on what you are actually trying to do, Is this for monitoring your own sites or are you planning to offer this as some kind of a service
ScrapingBee may be something of use
I don’t see why you wouldn’t be able to connect to the Browserless API even with changeDetection working. How you make the request would depend on which API you are trying to call: REST APIs For Common File Outputs

Daniel_Raffel · September 15, 2024, 7:52pm

I gave this a try and ended up using the http request node and described the docker setup and shared some n8n workflows here How To Streamline Web Tasks by Integrating Browserless, Playwright and ChangeDetection.io with n8n - Daniel's Journal

minhlucvan · October 2, 2024, 8:09am

Hi @Henry_Junior, if you want to go further with Browserless, you might want to check out my node:

The node provides integration with self-hosted or cloud Browserless instances.

Hope that helps!

system · December 31, 2024, 8:10am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.