I’m currently building a workflow to enrich company data by finding the official website domain based on the company name (sometimes with city / address).
So far I’ve tested a few approaches:
Perplexity node → good results, but feels too expensive for my use case
OpenAI / LLM approach → cheaper, but often returns wrong domains (wrong url or unrelated companies)
Goal:
I’m working with large datasets (50k+ companies)
Cost per lookup needs to stay well below 1 cent / search
Needs to be reasonably reliable, not necessarily 100% perfect
What’s your recommended approach to reliably determine a company’s official website domain at scale, without using expensive APIs?
Any patterns, nodes, or workflows that worked well for you?
My go-to for this kind of research task is Apify’s Google scraper, it starts from around 0.3 cents per search page.
The general idea is that you can Google search for the company’s name in quotes;
Pull the first few results and then excluding any directory results, it’s very likely that you can get the company’s website
And you can just use the Apify node to call this from n8n and do any post-processing and export with the data obtained. I just run the workflow directly in Apify and export it as Excel.
The input Json would look something like this, keeping the limits low to minimize costs:
{ "queries": "company name 1\ncompany name 2\n...",
"resultsPerPage": 5,
"maxPagesPerQuery": 1,
"aiMode": "aiModeOff", }
You can also enable an enrichment for around an additional 0.4 cents which would get some prospect’s contact info and LinkedIns