Best approach for login + document download from multiple B2B portals (low frequency, monthly)

Hi everyone,

I’m working on a workflow where I need to log in to multiple B2B supplier portals and download a small number of documents (mostly PDFs).

Context:

  • Around 15–20 different supplier websites

  • Mostly B2B energy providers

  • Access is via private / reserved areas (username + password)

  • No public APIs available (or only enterprise-level integrations)

  • Execution frequency is very low:
    :backhand_index_pointing_right: 1–2 runs per month per site

  • Each run usually means:

    • login

    • navigate to a specific section

    • download 1–5 documents

  • No aggressive scraping, no parallel requests, human-like behavior

The goal is simply to replicate what a human operator does manually today, but in an automated and repeatable way.


My current questions

  1. From your experience, what is the best technical approach in n8n for this kind of scenario?

    • HTTP Request node with cookies/session handling?

    • Browser automation (Playwright / Puppeteer)?

    • External services integrated via Execute Command / Code node?

  2. For sites with:

    • heavy JS

    • dynamic DOM

    • protected download buttons
      is Playwright the most reliable solution in your opinion?

  3. Any best practices to avoid account blocking, given:

    • very low frequency

    • sequential requests

    • realistic delays between actions?

  4. Would you recommend:

    • one generic Playwright workflow parametrized per site

    • or one workflow per supplier?


Constraints / notes

  • Using code is not a problem (JS / TS / Python are fine)

  • Credentials will be handled securely via n8n credentials

  • No CAPTCHA solving services planned (if CAPTCHA appears β†’ manual fallback is acceptable)

  • Main priority: stability over performance


What I’m looking for

Real-world advice from people who:

  • already automated login + download flows

  • used Playwright with n8n

  • dealt with B2B portals without APIs

Any architectural suggestions, node recommendations, or links to similar implementations are very welcome.

Thanks in advance :raising_hands:

Hi @JJJorg,

I don’t specifically have hands on experience withg Playwright, but Im sure I can help you figure this out. I have used the puppeteer community node in the past to get past cloudflare bot detection, but in most cases just using the html node for extracting information should be good enough. Essentially all you need is to read download urls from the html and then use a simple http node to download the files.

  1. The only way I have scraped information from private web apps is through pulling the curl from chrome for a page and then hardcode the session/cookie data in the http node. This does of course expire, so in some cases you’ll need to simulate a user login, store the cookie, and then inject it into the http calls. As mentioned, I have used puppeteer for getting past security checks, but 95% of the time the extract html node worked perfectly.
  2. For simulating button clicks, playwright should a solid option.
  3. You could use the Wait node to pause a few seconds between actions
  4. I would recommend creating a sub workflow for each supplier as the way you would scrape, navigate and download files would be very custom
1 Like