Using Javascript / Function Node to Scrape HTML? create DOM/document?

Is it possible to use the Function node/javascript to scrape data?
I have used the HTML Extract node to get a subset of a page’s HTML but I think I have pushed the node as far as it can go and I need to use javascript to extract the rest.
I have some working code that can convert the

    to json in chrome dev tools however when I use in a javascript function in n8n to run the code it throws errors because there is no DOM or document.
    Error: document.querySelectorAll is not a function. ← most likely cause DOM has not been initiated.

Hey @Ben_Hadman,

Welcome to the community :tada:

You should be able to use a function node to work on HTML data but the problem you will run into is the data will be a string and when you use the dev console in a browser things are a bit different.

Looking at our HTML Extract node we use Cheerio so you could use that in your function node by setting the allow external environment variable NODE_FUNCTION_ALLOW_EXTERNAL=cheerio.

Then in your function node you can do something like the below to start playing with HTML data.

const cheerio = require('cheerio');
const $ = cheerio.load('<h2 class="title">Hello world</h2>');

$('h2.title').text('Hello there!');

output = $.html();

return [{json: {output}}];
1 Like

Thanks @jon , thats very helpful. I’m used to puppeteer so cheerio is quite similar.

Is it possible to enable External node libraries when using the ‘Windows Installed’ version of n8n, not the version installed through npm?

Also is it possible to send requests using native ‘fetch’ in a javascript function rather than using an external library?