Airtable -> 1600 domains x 10 URLs each -> Airtable

Brand new to n8n and have spent many hours trying to resolve this. . .

I need help with:
Airtable –> Get top 10 URLs (top level/main navigation pages/Google links) from 1600 company websites) –> Airtable.

MVP: Airtable –> Top 10 URLs from 10 company websites –> Airtable.

I have:

  • A table in Airtable with:
    — List with company names in the first column in Airtable
    — List with domains (without https or http) in the second column
    — Separate columns for TopURL1 through to TopURL10

I want to avoid:

  • Random URLs to be included ie news, blogs, affiliate links, redirects ie URLs that are not relevant

What I have tried:

  1. Followed intructions from Claude Sonnet and created a workflow that executed, but I made a mistake in the prompt and he assisted me in creating 100 URLs based on keywords for main navigation pages in the industry. I was really happy when it seemed to have worked, not so happy when I realised that all the URLs were made up (ugh hurts to admit this).

  2. Followed instructions by ChatGPT and used SerpAPI with site: in the workflow on n8n and it got a lot of empty URLs. I did manage to get the workflow to work eventually, but when I checked the URLs that was added to Airtable, it only added one URL for one company for all 10 companies x 10 URLs. Meaning 100 URLs of one page for one company.

  3. Together with Claude I also tried using Apify by adding it to the workflow, but without success. It gave me 1 URL for 1 company and it was the home page which I already have for each company.

  4. I then found a similar question on the n8n forum from 2023 where they said to add a HTML extract node, but I could not find it in the list. I then looked at the other HTML nodes and. . . I decided to post this question. I am completely stuck.

Trying to understand if I can achieve this with n8n, if I should set up an automation or if I should create an agent? Maybe an agent that visits each page and understands what pages/URLs should be collected in a list would work better? I am trying to avoid doing a scrape with loads of random URLs and then having to clean up and sort a large amount of data.

I don’t have coding experience, but I am able to pick things up fast and have no problem following instructions. Please, could someone help me? Explain what node(s) are available that I can use for this and I’ll work out the configuration, or even if it’s just confirming if it can be done at all (or not) and if this sounds like an automation or agent job.

Thanks a lot

1 Like