[NEW] ScrapeNinja official integration with n8n: web scraping API with rotating proxies and real browser

ScrapeNinja is a web scraping API SaaS that can run a real browser and capture website screenshots. I use it with n8n every day to scrape thousands of web pages and extract useful data.

Ever since I published my first n8n web scraping tutorials using the ScrapeNinja API (via the n8n HTTP request node), I’ve received tons of questions from n8n users — understandably so, because ScrapeNinja offers many parameters to control proxies, browser behavior, and more. Putting all these params into big JSON object to send into API endpoint of ScrapeNinja, all via n8n HTTP node, was not exactly user-friendly.

I’m now launching an official ScrapeNinja n8n integration: a full-fledged activity node. Check out the repo on npm:

To install it on self-hosted n8n, open Settings → Community Nodes, enter “n8n-nodes-scrapeninja”, install, and start scraping! Two modes are available:

• /scrape: A fast scraper using raw network requests.

• /scrape-js: Runs a real browser with JS evaluation (and can capture screenshots).

For more info, check out ScrapeNinja docs.

Both modes share many parameters, and you can switch between them easily. They’re also fully compatible with the awesome JS extractor feature, which lets you write a snippet of JavaScript to pull specific data from a page’s HTML:

I would really appreciate n8n community feedback.

4 Likes

Here is an example n8n workflow with ScrapeNinja:

Take screenshots of hundreds of websites, put screenshot URLs into Google Sheets:

1 Like

Hi @Anthony

Awesome stuff, always nice to see new nodes being published. :slight_smile:

I have a question though, what do you mean by official?
Are you in any way involved with the company behind ScrapeNinja?

I am the founder of ScrapeNinja. Thanks @BramKn !

3 Likes

Awesome thanks for clarifying and creating the node :slight_smile:

1 Like

Nicely done, but setting authentication ain´t straightforward.

CF bypassing without any issue <3

Nicely done, but setting authentication ain´t straightforward.

Thanks! Please, tell me more. Do you mean that it is not straightforward where to get an API key?

No, I have installed the node and it´s complaining about credentials

Oh, no…
This does not look good - of course you don’t need to send api key as header!
This is how it should look like:

It turns out some users still experience this - but I thought I have fixed this for all new installs. What n8n version are you using, can you try to update to the latest one or just restart your n8n instance? I am still not sure why this happens.

If you have n8n logs, please send them to me - it should have some “warn” or “error” regarding credentials. I appreciate your feedback!

I don´t even have a credentials field :smiley:

2|n8n | Failed to load Custom API options for the node “n8n-nodes-scrapeninja.scrapeNinja”: Unknown credential name “scrapeninjaApi”

If I start the workflow:

| Node does not have any credentials set
2|n8n | Error: Node does not have any credentials set
2|n8n | at new NodeOperationError (/usr/local/lib/node_modules/n8n/node_modules/n8n-workflow/src/errors/node-operation.error.ts:22:12)
2|n8n | at ExecuteContext._getCredentials (/usr/local/lib/node_modules/n8n/node_modules/n8n-core/src/execution-engine/node-execution-context/node-execution-context.ts:237:12)
2|n8n | at ExecuteContext.getCredentials (/usr/local/lib/node_modules/n8n/node_modules/n8n-core/src/execution-engine/node-execution-context/base-execute-context.ts:96:21)
2|n8n | at ExecuteContext.execute (/root/.n8n/nodes/node_modules/n8n-nodes-scrapeninja/nodes/ScrapeNinja/ScrapeNinja.node.ts:311:34)
2|n8n | at WorkflowExecute.runNode (/usr/local/lib/node_modules/n8n/node_modules/n8n-core/src/execution-engine/workflow-execute.ts:1097:31)
2|n8n | at /usr/local/lib/node_modules/n8n/node_modules/n8n-core/src/execution-engine/workflow-execute.ts:1505:38
2|n8n | at processTicksAndRejections (node:internal/process/task_queues:95:5)
2|n8n | at /usr/local/lib/node_modules/n8n/node_modules/n8n-core/src/execution-engine/workflow-execute.ts:2066:11

Thank you.
This means some caching is involved in here: this bug was fixed in v0.1.5

Please make sure you are using the latest version of the package (at least v0.1.5)

I have just pushed an update to ScrapeNinja n8n node (v0.3.0) which could eventually turn n8n into a serious web scraping machine.

  1. New operation: extract primary content from HTML - uses the awesome Mozilla readability package, which smartly extracts primary text corpus from a page.
  2. New operation: cleanup HTML. Uses the Cheerio package to smartly traverse through HTML and omit things which are not strictly important, like script tags, iframes, various html attributes, onClick handlers, html comments, and whitespace. This is especially useful for later LLM processing stages where every token counts.
  3. Custom extractor JS evaluation. This is I think the most impactful feature. Now, you can leverage LLM inside n8n workflow to generate JS extractor code, which is later evaluated securely by this node so you get clean JSON data from ANY web page! Self-healing web scrapers are now trivial to build.

Wait, here is the best part: all these new features are completely local and do not require ScrapeNinja API key (you will need some LLM for code generation to work, obviously)

To install ScrapeNinja n8n node, in your self-hosted instance, go to Settings → Community nodes, enter “n8n-nodes-scrapeninja”, and try scraping something. Make sure you are using at least v0.3.0. Let me know how it goes!

video with explanation:

3 Likes

I was cooking something amazing this week: ScrapeNinja Recursive Web Crawler, packed into an n8n community node.

It is a full-fledged, open-source recursive web crawler that traverses websites according to page URL rules and stores all pages in a Postgres database.

It uses ScrapeNinja web scraping engines (via API) to scrape each HTML page, extract all links from the page, feed them into a queue, and re-iterate until the page limit is reached or there are no more unique links to crawl.

Here’s my primary use case: building knowledge bases. I want to crawl all the documentation of a product (30-50-70 pages), extract the primary content from each page, and convert it into one huge Markdown document that can later be fed into an LLM. This is a real-world scenario: ScrapeNinja now features a compiled Markdown file (knowledge base) containing all the docs, which you can easily download and insert into the LLM of your choice to integrate ScrapeNinja into any project where you need reliable web scraping.

Yes, you read it right - I used the ScrapeNinja web crawler to crawl ScrapeNinja docs. I need your feedback on the ScrapeNinja n8n crawler! Note that it is an experimental release and this tool is advanced and powerful. I made sure to output detailed crawler logs as node JSON output; you can also poll the Postgres crawler_logs table to track the crawler node run (as it can take many minutes to finish). I am using cloud Supabase as a Postgres instance and I can recommend it for everyone.

To install it on self-hosted n8n, open Settings → Community Nodes, enter “n8n-nodes-scrapeninja”, install, and start scraping! The crawler node requires ScrapeNinja API key. Make sure your ScrapeNinja n8n node version is at least 0.4.0

Video demo:

1 Like