Hey guys, I’ve tried several times to get a workflow working and can’t seem to get it right. My boss has a startup, and the team he hired was new at it because none of us had managed a big project like this before. In the content publishing pipeline, that means you get about 500 blogs in and realize you just tortured everyone on the team. Big believer in content planning now.
So we spend our time rewriting already written content, searching the sitemap for keywords that we want to target to see if we already used it, etc.
Here’s what I know would fix it: (bare bones version)
My Workflow:
Step 1: Pull Every URL From The Sitemap
I want n8n to scrape multiple XML sitemaps from several websites and store all URLs in organized datasets/tables seperately.
The workflow would:
-
Fetch sitemap XML
-
Parse all URLs
-
Clean slugs/titles
-
Alert us of any duplicates
-
Categorize URLS by:
-
Semantic Relationship
It will crawl first to get the most up to date info.
Step 2: Create A Searchable Content Database
I want all URLs, titles, categories, keywords, and maybe embeddings stored somewhere searchable.
Possible storage:
- Google Sheets (temporary/testing)
Main goal:
When someone enters a keyword or topic idea, the system instantly checks:
-
Do we already cover this?
-
Did we partially cover it?
-
Is there cannibalization risk?
-
Is this actually a missing cluster?
-
Which page should this topic support internally?
Step 3: AI Decision Agent
This is the part I keep struggling with.
I want an AI agent that:
-
Receives a content idea or keyword
-
Chooses the correct website/project
-
Searches the proper sitemap/content database
-
Analyzes semantic overlap
-
Determines:
-
Existing coverage
-
Missing coverage
-
Revenue intent
-
Search intent
-
Suggested cluster placement
-
Internal link opportunities
-
Basically:
“Should we write this article or not?”
And if yes:
-
Where does it belong?
-
What parent topic supports it?
-
What related articles should exist around it?
-
What pages should link to it?
Step 4: Missing Topic Cluster Generation
Once the system finds a gap, I want it to generate:
-
Missing article ideas
-
Semantic support articles
-
FAQs
-
People Also Ask angles
-
Supporting commercial pages
-
Location modifiers
-
Entity relationships
-
Case study opportunities
I’m heavily influenced by Koray-style topical authority mapping, but I want it practical for real content operations.
Step 5: Content Brief Generation
If approved:
Generate a structured content brief including:
-
Primary keyword
-
Secondary keywords
-
Search intent
-
Funnel stage
-
Recommended headings
-
Internal links
-
Conversion goals
-
CTA suggestions
-
Schema suggestions
-
Competitor gaps
-
Suggested word count
-
Important entities/topics to mention
Current Problems I Keep Running Into
-
Duplicate content ideas
-
Agents forgetting prior sitemap data or adding titles that don’t exist
-
Context windows getting overloaded
-
Slow semantic searches
-
AI generating topics already covered
-
Poor organization between multiple websites
-
Difficulty routing prompts to the correct site database
-
JSON formatting breaking between nodes
-
Embeddings/search setups becoming expensive fast
Is this even the best way to set this up? I really want to get it working. My boss got robbed in his first year for $150k. He doesn’t need the leaks in production, and it would put our content strategy on “foolproof and untouchable”.
Any help will be appreciated!