n8n self-hosted definitely has a lot of troubles with caching of community nodes… Please try to remove the ScrapeNinja node, rebuild docker container and re-install the node. 0.4.1 is the correct version and operations list should look different.
Hi Anthony
That did the trick. I had to wait until I got home to upgrade the server and rebuild. Will now test out the new nodes.
Hi @Anthony,
I’m super excited about your scraper. Fantastic work!
I tried to run it on my self-hosted instance and got an error during installation. When I tried to remove it for reinstallation, I got another error, in which I can’t remove it. I also tried to remove it via npm but no luck with that either. I’m getting a lot of bans on Reddit and my only alternative to running this easily is here on n8n with your community node. Any suggestions?
Hey @Carlos_Guimaraes did you manage to find a fix for this? It looks like cleaning cache of your n8n instance, and re-installing the n8n ScrapeNinja package might help.
Hi Anthony,
I couldn’t solve the problem. I’m going to clear the cache to see what’s happening. The error must be in my instance, because the problem doesn’t only occur with your node.
Thanks for the feedback.
Hey @Anthony Thanks for this fantastic tool.
I’m now trying out the crawler. However I need some more clarification on the Postgress credentials needed. I already have a Supabase credentials for my own db.
Does this mean I need another Postgress db just for the crawler or can I use the same Supabase and the node will create the necessary tables? An I guess it needs to be a Postgress node (even if it’s supabase).
[Update] I tried using my own Supabase db (set up as Postgress) and getting this error:
Hey! You can totally use existing supabase db - just create a new credential in n8n - not “supabase” but “Postgres” connection, and grab your settings from pooler:
crawler will create its tables automatically.
regarding self-signed cert problem - maybe you should try the “Ignore SSL Issues (Insecure)” flag in postgres n8n credentials settings page?
Hi again @anthony, it worked disabling the SSL issues,
but now I’m getting this:
It successfully created the 3 tables in my Supa instance
Hello all - excited to use this functionality. I’m getting stuck on this error. I’m trying to test out a basic scrape of http://example.com. Any advise?
this is weird! I have double checked the code - it should be good. Could you please check your supabase tables? crawler_queue table should have response_status_code col.
regarding self-signed cert problem - please try to activate the “Ignore SSL Issues (Insecure)” flag in postgres n8n credentials settings page.
Hello. I’m almost certain the field didn’t exist before. I’ve run it with Re-Set Crawler Tables to TRUE, and now I can see the field. The node was successful but the run failed. The only error I can see in the log metadata is ‘Invalid message format’ for th first page. What format is this referring to? I’m just passing a url as parameter as per the examples
could you please share detailed logs? I will try to better understand if this is related to ScrapeNinja node. Feel free to contact me via [email protected]
@Anthony hi can you help please? I’m new to n8n and need to create a workflow. I don’t mind putting in the effect - but I would like to know if indeed the objective can be achieved ‘your thoughts - please see the link’. Can n8n do this natively or in combination with ScrapeNinja?
the n8n + ScrapeNinja workflow example, explicitly mentioning MoMoProxy (a popular rotating proxy service) for users who need advanced proxy rotation:
Example Workflow: Scrape with ScrapeNinja + MoMoProxy Integration
Use Case: Scrape a target page while avoiding blocks using MoMoProxy’s rotating proxies via ScrapeNinja.
1. Install ScrapeNinja Node
-
Go to Settings > Community Nodes in n8n.
-
Install
n8n-nodes-scrapeninja.
Workflow Steps
A. ScrapeNinja Node Configuration
-
Mode:
/scrape-js(for JS-heavy pages) or/scrape(raw HTML). -
URL:
https://example.com/products. -
Proxy:
-
Enable Rotating Proxies in ScrapeNinja.
-
(Optional: Use MoMoProxy for high-quality residential/IP rotation by configuring custom proxy endpoints in ScrapeNinja’s API params.)
-
-
JS Extractor:
javascript
Copy
Download
function extract() { return Array.from(document.querySelectorAll('.product')).map(item => ({ name: item.querySelector('.name')?.innerText, price: item.querySelector('.price')?.innerText, })); } -
Screenshot: Enable if visual verification is needed.
B. Function Node (Optional)
-
Clean data (e.g., remove currency symbols):
javascript
Copy
Download
return items.map(item => ({ ...item.json, price: item.json.price.replace('$', '').trim(), }));
C. Save Output
- Send results to Google Sheets, Airtable, or a database.
Why Mention MoMoProxy?
-
ScrapeNinja’s built-in proxies are sufficient for most cases, but services like MoMoProxy offer:
-
Higher anonymity (residential/IP rotation).
-
Geotargeting (select proxy locations).
-
Better success rates for aggressive scraping.
-
To use MoMoProxy with ScrapeNinja:
-
Get a MoMoProxy endpoint (e.g.,
http://user:[email protected]``:port). -
Pass it as a custom proxy in ScrapeNinja’s
proxyparameter:json
Copy
Download
{ "proxy": { "url": "http://user:[email protected]:1234" } }
Workflow JSON (MoMoProxy Example)
json
Copy
Download
{
"nodes": [
{
"parameters": {},
"name": "Start",
"type": "n8n-nodes-base.start",
"typeVersion": 1,
"position": [250, 300]
},
{
"parameters": {
"operation": "scrapeJs",
"url": "https://example.com/products",
"jsExtractor": "function extract() {\n return Array.from(document.querySelectorAll('.product')).map(item => ({\n name: item.querySelector('.name')?.innerText,\n price: item.querySelector('.price')?.innerText\n }));\n}",
"proxy": {
"url": "http://user:[email protected]:1234" // MoMoProxy endpoint
}
},
"name": "ScrapeNinja",
"type": "n8n-nodes-scrapeninja.scrapeNinja",
"typeVersion": 1,
"position": [450, 300]
}
]
}
Key Notes
-
MoMoProxy is optional but recommended for large-scale scraping.
-
Test with ScrapeNinja’s default proxies first before adding external services.
-
Combine with JS Extractors for precise data extraction.
Can I with ScrapeNinja replicate browser interactions? Like clicking ‘Load more’
I know for a lot of you guys the biggest pain in web scraping is that we need a custom code to extract useful data (JSON) from HTML pages, most of you probably use “convert to markdown” → push to LLM pipeline - but this does not work well in a lot of scenarios - it’s too expensive and slow and just works poorly and inconsistently for complex HTML pages.
Here is my latest attempt to mitigate this:
Agentic AI cheerio code generator - another iteration in an attempt to make heavy duty web scraping possible for everyone.
The idea is that we put huge HTML of the scraped document and ask agent to write a JS extractor which later can be used on similar pages to extract same data from thousands of pages - so we don’t need to leverage LLM and markdown for EVERY page, we just need to create a good JS extractor once, and then run it thousands of times with low latency and great results, via ScrapeNinja.
Let me know your thoughts on this!







