I want to build a multi agent AI workflow that scans all of our sitemaps and gives content strategy recommendations

Hey guys, I’ve tried several times to get a workflow working and can’t seem to get it right. My boss has a startup, and the team he hired was new at it because none of us had managed a big project like this before. In the content publishing pipeline, that means you get about 500 blogs in and realize you just tortured everyone on the team. Big believer in content planning now.

So we spend our time rewriting already written content, searching the sitemap for keywords that we want to target to see if we already used it, etc.

Here’s what I know would fix it: (bare bones version)

My Workflow:

Step 1: Pull Every URL From The Sitemap

I want n8n to scrape multiple XML sitemaps from several websites and store all URLs in organized datasets/tables seperately.

The workflow would:

  • Fetch sitemap XML

  • Parse all URLs

  • Clean slugs/titles

  • Alert us of any duplicates

  • Categorize URLS by:

  • Semantic Relationship

It will crawl first to get the most up to date info.


Step 2: Create A Searchable Content Database

I want all URLs, titles, categories, keywords, and maybe embeddings stored somewhere searchable.

Possible storage:

  • Google Sheets (temporary/testing)

Main goal:
When someone enters a keyword or topic idea, the system instantly checks:

  • Do we already cover this?

  • Did we partially cover it?

  • Is there cannibalization risk?

  • Is this actually a missing cluster?

  • Which page should this topic support internally?


Step 3: AI Decision Agent

This is the part I keep struggling with.

I want an AI agent that:

  1. Receives a content idea or keyword

  2. Chooses the correct website/project

  3. Searches the proper sitemap/content database

  4. Analyzes semantic overlap

  5. Determines:

    • Existing coverage

    • Missing coverage

    • Revenue intent

    • Search intent

    • Suggested cluster placement

    • Internal link opportunities

Basically:
“Should we write this article or not?”

And if yes:

  • Where does it belong?

  • What parent topic supports it?

  • What related articles should exist around it?

  • What pages should link to it?


Step 4: Missing Topic Cluster Generation

Once the system finds a gap, I want it to generate:

  • Missing article ideas

  • Semantic support articles

  • FAQs

  • People Also Ask angles

  • Supporting commercial pages

  • Location modifiers

  • Entity relationships

  • Case study opportunities

I’m heavily influenced by Koray-style topical authority mapping, but I want it practical for real content operations.


Step 5: Content Brief Generation

If approved:
Generate a structured content brief including:

  • Primary keyword

  • Secondary keywords

  • Search intent

  • Funnel stage

  • Recommended headings

  • Internal links

  • Conversion goals

  • CTA suggestions

  • Schema suggestions

  • Competitor gaps

  • Suggested word count

  • Important entities/topics to mention


Current Problems I Keep Running Into

  • Duplicate content ideas

  • Agents forgetting prior sitemap data or adding titles that don’t exist

  • Context windows getting overloaded

  • Slow semantic searches

  • AI generating topics already covered

  • Poor organization between multiple websites

  • Difficulty routing prompts to the correct site database

  • JSON formatting breaking between nodes

  • Embeddings/search setups becoming expensive fast

Is this even the best way to set this up? I really want to get it working. My boss got robbed in his first year for $150k. He doesn’t need the leaks in production, and it would put our content strategy on “foolproof and untouchable”.

Any help will be appreciated!

welcome to the n8n community @Workflow_Student
I would not start this as a big multi agent workflow. I’d split it into two workflows first. One workflow should ingest and normalize the sitemap data into a real content index with URL, site, title, slug, topic, metadata, and embeddings. The second workflow should receive a keyword, retrieve only the relevant records from that index, and then ask the AI to decide based on those retrieved records. n8n’s AI Agent can use tools and vector stores, but it will not magically remember all sitemap data unless you give it a proper retrieval layer, and the Structured Output Parser is the safer way to keep the final recommendation in a predictable JSON shape. For a first working version, I’d avoid “multi agent” and build a controlled RAG flow with one clear retrieval step, one decision step, and one brief generation step.

Hi — this sounds like it needs a smaller proof-of-concept before a full multi-agent system.

I’d start with: sitemap URL intake → crawl/extract page metadata → cluster pages by topic → AI recommendations with source links → Google Sheet/Doc output. Keep it deterministic first, then add agents only where they improve review or prioritization.

If you can share one sitemap and the desired output format, I can quote a fixed first workflow.

Contact: travisofwork@gmail.com