I was cooking something amazing this week: ScrapeNinja Recursive Web Crawler, packed into an n8n community node.
It is a full-fledged, open-source recursive web crawler that traverses websites according to page URL rules and stores all pages in a Postgres database.
It uses ScrapeNinja web scraping engines (via API) to scrape each HTML page, extract all links from the page, feed them into a queue, and re-iterate until the page limit is reached or there are no more unique links to crawl.
Here’s my primary use case: building knowledge bases. I want to crawl all the documentation of a product (30-50-70 pages), extract the primary content from each page, and convert it into one huge Markdown document that can later be fed into an LLM. This is a real-world scenario: ScrapeNinja now features a compiled Markdown file (knowledge base) containing all the docs, which you can easily download and insert into the LLM of your choice to integrate ScrapeNinja into any project where you need reliable web scraping.
Yes, you read it right - I used the ScrapeNinja web crawler to crawl ScrapeNinja docs. I need your feedback on the ScrapeNinja n8n crawler! Note that it is an experimental release and this tool is advanced and powerful. I made sure to output detailed crawler logs as node JSON output; you can also poll the Postgres crawler_logs table to track the crawler node run (as it can take many minutes to finish). I am using cloud Supabase as a Postgres instance and I can recommend it for everyone.
To install it on self-hosted n8n, open Settings → Community Nodes, enter “n8n-nodes-scrapeninja”, install, and start scraping! The crawler node requires ScrapeNinja API key. Make sure your ScrapeNinja n8n node version is at least 0.4.0
Video demo: