[NEW] ScrapeNinja official integration with n8n: web scraping API with rotating proxies and real browser

ScrapeNinja is a web scraping API SaaS that can run a real browser and capture website screenshots. I use it with n8n every day to scrape thousands of web pages and extract useful data.

Ever since I published my first n8n web scraping tutorials using the ScrapeNinja API (via the n8n HTTP request node), I’ve received tons of questions from n8n users — understandably so, because ScrapeNinja offers many parameters to control proxies, browser behavior, and more. Putting all these params into big JSON object to send into API endpoint of ScrapeNinja, all via n8n HTTP node, was not exactly user-friendly.

I’m now launching an official ScrapeNinja n8n integration: a full-fledged activity node. Check out the repo on npm:

To install it on self-hosted n8n, open Settings → Community Nodes, enter “n8n-nodes-scrapeninja”, install, and start scraping! Two modes are available:

• /scrape: A fast scraper using raw network requests.

• /scrape-js: Runs a real browser with JS evaluation (and can capture screenshots).

For more info, check out ScrapeNinja docs.

Both modes share many parameters, and you can switch between them easily. They’re also fully compatible with the awesome JS extractor feature, which lets you write a snippet of JavaScript to pull specific data from a page’s HTML:

I would really appreciate n8n community feedback.

3 Likes

Here is an example n8n workflow with ScrapeNinja:

Take screenshots of hundreds of websites, put screenshot URLs into Google Sheets:

1 Like

Hi @Anthony

Awesome stuff, always nice to see new nodes being published. :slight_smile:

I have a question though, what do you mean by official?
Are you in any way involved with the company behind ScrapeNinja?

I am the founder of ScrapeNinja. Thanks @BramKn !

3 Likes

Awesome thanks for clarifying and creating the node :slight_smile:

1 Like

Nicely done, but setting authentication ain´t straightforward.

CF bypassing without any issue <3

Nicely done, but setting authentication ain´t straightforward.

Thanks! Please, tell me more. Do you mean that it is not straightforward where to get an API key?

No, I have installed the node and it´s complaining about credentials

Oh, no…
This does not look good - of course you don’t need to send api key as header!
This is how it should look like:

It turns out some users still experience this - but I thought I have fixed this for all new installs. What n8n version are you using, can you try to update to the latest one or just restart your n8n instance? I am still not sure why this happens.

If you have n8n logs, please send them to me - it should have some “warn” or “error” regarding credentials. I appreciate your feedback!

I don´t even have a credentials field :smiley:

2|n8n | Failed to load Custom API options for the node “n8n-nodes-scrapeninja.scrapeNinja”: Unknown credential name “scrapeninjaApi”

If I start the workflow:

| Node does not have any credentials set
2|n8n | Error: Node does not have any credentials set
2|n8n | at new NodeOperationError (/usr/local/lib/node_modules/n8n/node_modules/n8n-workflow/src/errors/node-operation.error.ts:22:12)
2|n8n | at ExecuteContext._getCredentials (/usr/local/lib/node_modules/n8n/node_modules/n8n-core/src/execution-engine/node-execution-context/node-execution-context.ts:237:12)
2|n8n | at ExecuteContext.getCredentials (/usr/local/lib/node_modules/n8n/node_modules/n8n-core/src/execution-engine/node-execution-context/base-execute-context.ts:96:21)
2|n8n | at ExecuteContext.execute (/root/.n8n/nodes/node_modules/n8n-nodes-scrapeninja/nodes/ScrapeNinja/ScrapeNinja.node.ts:311:34)
2|n8n | at WorkflowExecute.runNode (/usr/local/lib/node_modules/n8n/node_modules/n8n-core/src/execution-engine/workflow-execute.ts:1097:31)
2|n8n | at /usr/local/lib/node_modules/n8n/node_modules/n8n-core/src/execution-engine/workflow-execute.ts:1505:38
2|n8n | at processTicksAndRejections (node:internal/process/task_queues:95:5)
2|n8n | at /usr/local/lib/node_modules/n8n/node_modules/n8n-core/src/execution-engine/workflow-execute.ts:2066:11

Thank you.
This means some caching is involved in here: this bug was fixed in v0.1.5

Please make sure you are using the latest version of the package (at least v0.1.5)

I have just pushed an update to ScrapeNinja n8n node (v0.3.0) which could eventually turn n8n into a serious web scraping machine.

  1. New operation: extract primary content from HTML - uses the awesome Mozilla readability package, which smartly extracts primary text corpus from a page.
  2. New operation: cleanup HTML. Uses the Cheerio package to smartly traverse through HTML and omit things which are not strictly important, like script tags, iframes, various html attributes, onClick handlers, html comments, and whitespace. This is especially useful for later LLM processing stages where every token counts.
  3. Custom extractor JS evaluation. This is I think the most impactful feature. Now, you can leverage LLM inside n8n workflow to generate JS extractor code, which is later evaluated securely by this node so you get clean JSON data from ANY web page! Self-healing web scrapers are now trivial to build.

Wait, here is the best part: all these new features are completely local and do not require ScrapeNinja API key (you will need some LLM for code generation to work, obviously)

To install ScrapeNinja n8n node, in your self-hosted instance, go to Settings → Community nodes, enter “n8n-nodes-scrapeninja”, and try scraping something. Make sure you are using at least v0.3.0. Let me know how it goes!

video with explanation:

2 Likes