HelpUse puppeteer to control local chrome from within n8n or other ways to scrape logged in websites

selfhost v1.80.3

I’m having some issues figuring out the best way to scrape websites that require being logged in while within n8n. I’ve made a couple different scripts/methods that this works locally, I just can’t get that last little bit to make it work within n8n.

maybe I’m making it too hard, but I wasn’t able to get the authentication to work with the http node which lead me down the puppeteer rabbit hole.

An example website I’m trying to do is www.economist.com

What I’ve tried:
Method 0: HTTP node, can’t authenticate and maintain authentication- if someone can tell me how to make this work I can drop all the other stuff lol

Method 1: - puppeteer community node with Browserless remote browser

Works great on websites that don’t need authentication.

Issue: Authentication & login seems to be the biggest issue, I think I also need headful for the sites I’m visiting, but that’s only because I’ve only been able to make the authentication flow work using a headful browser.



const sleep = ms => new Promise(res => setTimeout(res, ms));

const page = await $browser.newPage();

const articles = [
  {
    Title: "Donald Trump: the would-be king",
    url: "https://www.economist.com/leaders/2025/02/20/donald-trump-the-would-be-king"
  },
  {
    Title: "How Europe must respond as Trump and Putin smash the post-war order",
    url: "https://www.economist.com/leaders/2025/02/20/how-europe-must-respond-as-trump-and-putin-smash-the-post-war-order"
  }
];

await $page.goto('https://myaccount.economist.com/s/login/', { waitUntil: 'networkidle2', timeout: 10000 });
await sleep(0);
await $page.waitForSelector('input[name="username"]', { timeout: 10000 });

await $page.type('input[name="username"]', '[email protected]');
await $page.type('input[name="password"]', 'password');

await $page.click('button[type="submit"]');
await sleep(0);

await $page.evaluate(() => {
  document.querySelector('[data-test-id="masthead-login-link"]').click();
});
await sleep(0);

const extractedArticles = [];

for (const article of articles) {
    console.log(`Processing article: ${article.Title}`);

    await $page.goto(article.url, { waitUntil: 'networkidle2', timeout: 10000 });
    await sleep(0);

    const articleText = await $page.evaluate((selector) => {
      const paragraphs = document.querySelectorAll(selector);
      return Array.from(paragraphs).map(p => p.textContent).join("\n");
    }, "div.css-80cr42.e1lrptjp2 p.css-1l5amll.e1y9q0ei0");

    const publishTime = await $page.evaluate(() => {
      const timeElement = document.querySelector('time');
      return timeElement ? timeElement.textContent.trim() : null;
    });

    extractedArticles.push({
      title: article.Title,
      content: articleText,
      publishTime: publishTime
    });
}

console.log(extractedArticles);


Method 2: - puppeteer community node trying to drive remote debug chrome running locally

issue: I can’t get puppeteer within n8n to properly connect to the web socket.

puppeteer locally can connect with no issues. At first I thought it was a docker issue, but I’m not sure if that’s the case anymore as it does connect to Browserless web socket and the local puppeteer can drive it.

basically the same code but with a different ws endpoint

Method 3: Code node - also with remote debug chrome
can’t get puppeteer to run

const puppeteer = require('puppeteer');

(async () => {
    const wsChromeEndpointurl = 'ws://127.0.0.1:9222/devtools/browser/250d5b06-6dc7-4a78-8e95-5798b16c9626';
    const browser = await puppeteer.connect({
        browserWSEndpoint: wsChromeEndpointurl,
        headless: false
    });

    const sleep = ms => new Promise(res => setTimeout(res, ms));

    try {
        // Define the input array of titles and links
        const articles = [
            {
                Title: "Donald Trump: the would-be king",
                url: "https://www.economist.com/leaders/2025/02/20/donald-trump-the-would-be-king"
            },
            {
                Title: "How Europe must respond as Trump and Putin smash the post-war order",
                url: "https://www.economist.com/leaders/2025/02/20/how-europe-must-respond-as-trump-and-putin-smash-the-post-war-order"
            }
        ];

        const extractedArticles = [];

        for (const article of articles) {
            const page = await browser.newPage(); // Create a new page for each article
            try {
                console.log(`Processing article: ${article.Title}`);

                // Navigate to the article page
                await page.goto(article.url, { waitUntil: 'networkidle2', timeout: 60000 });
                await sleep(3000);

                // Extract the article content
                const articleText = await page.evaluate((selector) => {
                    const paragraphs = document.querySelectorAll(selector);
                    return Array.from(paragraphs).map(p => p.textContent).join("\n");
                }, "div.css-80cr42.e1lrptjp2 p.css-1l5amll.e1y9q0ei0");

                const publishTime = await page.evaluate(() => {
                    const timeElement = document.querySelector('time'); // Adjust the selector as needed
                    return timeElement ? timeElement.textContent.trim() : null;
                });

                // Add the extracted content to the new array
                extractedArticles.push({
                    title: article.Title,
                    content: articleText,
                    publishTime: publishTime
                });

            } catch (error) {
                console.error(`Error processing article ${article.Title}:`, error);
            } finally {
                await page.close(); // Close the page after processing each article
            }
        }

        console.log("Extracted Articles:", extractedArticles);

    } catch (error) {
        console.error("Error during the process:", error);
    } finally {
        // Comment out or remove the browser.close() line to keep the browser open
        // await browser.close();
    }
})();

Script that works locally:

const puppeteer = require('puppeteer');

(async () => {
    const wsChromeEndpointurl = 'ws://127.0.0.1:9222/devtools/browser/250d5b06-6dc7-4a78-8e95-5798b16c9626';
    const browser = await puppeteer.connect({
        browserWSEndpoint: wsChromeEndpointurl,
        headless: false
    });

    const sleep = ms => new Promise(res => setTimeout(res, ms));

    try {
        // Define the input array of titles and links
        const articles = [
            {
                Title: "Donald Trump: the would-be king",
                url: "https://www.economist.com/leaders/2025/02/20/donald-trump-the-would-be-king"
            },
            {
                Title: "How Europe must respond as Trump and Putin smash the post-war order",
                url: "https://www.economist.com/leaders/2025/02/20/how-europe-must-respond-as-trump-and-putin-smash-the-post-war-order"
            }
        ];

        const extractedArticles = [];

        for (const article of articles) {
            const page = await browser.newPage(); // Create a new page for each article
            try {
                console.log(`Processing article: ${article.Title}`);

                // Navigate to the article page
                await page.goto(article.url, { waitUntil: 'networkidle2', timeout: 60000 });
                await sleep(3000);

                // Extract the article content
                const articleText = await page.evaluate((selector) => {
                    const paragraphs = document.querySelectorAll(selector);
                    return Array.from(paragraphs).map(p => p.textContent).join("\n");
                }, "div.css-80cr42.e1lrptjp2 p.css-1l5amll.e1y9q0ei0");

                const publishTime = await page.evaluate(() => {
                    const timeElement = document.querySelector('time'); // Adjust the selector as needed
                    return timeElement ? timeElement.textContent.trim() : null;
                });

                // Add the extracted content to the new array
                extractedArticles.push({
                    title: article.Title,
                    content: articleText,
                    publishTime: publishTime
                });

            } catch (error) {
                console.error(`Error processing article ${article.Title}:`, error);
            } finally {
                await page.close(); // Close the page after processing each article
            }
        }

        console.log("Extracted Articles:", extractedArticles);

    } catch (error) {
        console.error("Error during the process:", error);
    } finally {
        // Comment out or remove the browser.close() line to keep the browser open
        // await browser.close();
    }
})();

Check out https://apify.com/ They offer services that make scraping so much easier, you can also configure custom scrapers and full on playright wrappers for websites.

I already have it working locally, its really about getting it pulled into n8n

It is absolutely possible infact i created a job apply automation likewise, few questions:

  1. Are you running n8n in Docker or directly on your machine?
  2. Can you access your local Chrome remote debugging port from where n8n is running?

did you npm install puppeteer, and also update your package.json?

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.