[Use Case] Built an AI Legal Agent That Understands Laws + All Their Amendments (Semantic RAG + Knowledge Graph + n8n)

Karim_Nabil · December 9, 2025, 9:29am

Hi n8n community

First post here

I wanted to share a use case I recently worked on and originally wrote about on LinkedIn: an AI legal assistant that can answer questions about procurement laws including all their later amendments — not just the original law. ([LinkedIn](‏أعقد مشروع اشتغلت عليه حتى الآن! ملحوظة: البوست تقني شوية للمهتمين بمجال وكلاء الذكاء الاصطناعي في البيزنس من فترة كلمني مسئول من قطر، إنه عاوز يعمل تطبيق يساعد في استخلاص قوانين بشكل سريع من ملفات…‏ | ‏Karim Nabil‏ | ‏١٧٦‏ من التعليقات "‏أعقد مشروع اشتغلت عليه حتى الآن!

ملحوظة: البوست تقني شوية للمهتمين بمجال وكلاء الذكاء الاصطناعي في البيزنس

من فترة كلمني مسئول من قطر، إنه عاوز يعمل تطبيق يساعد في استخلاص قوانين بشكل سريع من ملفات…‏ | ‏Karim Nabil‏ | ‏١٧٦‏ من التعليقات"))

The domain: government tender / procurement laws, where you typically have:

A base law / regulation (e.g. from 2015)
Multiple amendments in 2019, 2024, etc.
Amendments that say things like:

“Article 9: The text of Article 5 is amended and a new clause is added…”

If you do naïve RAG here, you often retrieve the original article text but lose the amendments, so the answer is simply wrong.

The problem

My first attempt was the “obvious” one:

Parse PDFs
Chunk them
Embed + store in a vector DB (I’m using Supabase + pgvector) ([LinkedIn](‏أعقد مشروع اشتغلت عليه حتى الآن! ملحوظة: البوست تقني شوية للمهتمين بمجال وكلاء الذكاء الاصطناعي في البيزنس من فترة كلمني مسئول من قطر، إنه عاوز يعمل تطبيق يساعد في استخلاص قوانين بشكل سريع من ملفات…‏ | ‏Karim Nabil‏ | ‏١٧٦‏ من التعليقات "‏أعقد مشروع اشتغلت عليه حتى الآن!

ملحوظة: البوست تقني شوية للمهتمين بمجال وكلاء الذكاء الاصطناعي في البيزنس

من فترة كلمني مسئول من قطر، إنه عاوز يعمل تطبيق يساعد في استخلاص قوانين بشكل سريع من ملفات…‏ | ‏Karim Nabil‏ | ‏١٧٦‏ من التعليقات"))
Do semantic search + answer with an LLM

Result: wrong or incomplete answers whenever amendments were involved.

Why?

The original article contains the full context → has a high semantic score
Amendments are short and referential (“Article 9 amends Article 5…”)
A plain vector search happily retrieves the original article and ignores its later amendments

So we need something that understands relationships between articles, not just similar chunks.

High-level solution

I ended up with:

Semantic Chunking + Nodes & Relations → Custom Knowledge Graph → Relation-aware Retrieval

Roughly:

Semantic chunking (LLM-driven)
- Instead of fixed-size chunks, an LLM reads each law file and splits it into articles (each article = one semantic chunk with all its clauses).
- For each chunk, I generate rich metadata, including:
  - Article number
  - Law version / year
  - What other articles it refers to
  - Type of relation (e.g. amends, adds, cancels, clarifies, …)
Nodes & relations (knowledge graph)
- Each article becomes a node.
- Relations are stored like:
  - 5 → 9 (Article 9 amends Article 5)
  - 9 → 5 (reverse edge)
- Stored in Supabase (pgvector for embeddings + relational tables for edges).
Hybrid retrieval with relations
- When a user asks a question, I:
  1. Do hybrid search (semantic + keyword/filters) over articles.
  2. Expand via relations: pull connected nodes (amendments, clarifications, etc.).
  3. Give the LLM a bundle: original article + all its amendments.
- The LLM can then answer with the current, consolidated law, not the outdated version.

This is the conceptual pipeline I described in the LinkedIn post. ([LinkedIn](‏أعقد مشروع اشتغلت عليه حتى الآن! ملحوظة: البوست تقني شوية للمهتمين بمجال وكلاء الذكاء الاصطناعي في البيزنس من فترة كلمني مسئول من قطر، إنه عاوز يعمل تطبيق يساعد في استخلاص قوانين بشكل سريع من ملفات…‏ | ‏Karim Nabil‏ | ‏١٧٦‏ من التعليقات "‏أعقد مشروع اشتغلت عليه حتى الآن!

ملحوظة: البوست تقني شوية للمهتمين بمجال وكلاء الذكاء الاصطناعي في البيزنس

من فترة كلمني مسئول من قطر، إنه عاوز يعمل تطبيق يساعد في استخلاص قوانين بشكل سريع من ملفات…‏ | ‏Karim Nabil‏ | ‏١٧٦‏ من التعليقات"))

Where n8n comes in

Even though the core semantics/graph logic sit in my own service + Supabase, n8n is the backbone that glues everything together:

1. Ingestion & parsing flow

Trigger: manual, cron, or “new PDF uploaded” event
Nodes:
- HTTP Request → send PDF to a parsing service (or a code-based microservice)
- Code node → normalize the parsed structure, extract article metadata
- HTTP Request → upsert parsed articles into Supabase tables
n8n handles batching, retries, and monitoring.

2. Semantic chunking + graph building

For each article row in Supabase:

Supabase / HTTP Request node → fetch a batch of raw articles
OpenAI (or your LLM of choice) node → ask the model to:
- Clean up the article text
- Identify related articles (e.g., “this article amends Article 5”)
- Emit a JSON schema like:
```
{
  "article_id": "5",
  "text": "...",
  "relations": [
    {"target_article_id": "9", "type": "amends"}
  ]
}
```
Code node →
- Insert / update the article text & embedding (via Supabase RPC / HTTP)
- Insert relation edges in a separate table

So you end up with a vector store + a mini knowledge graph, orchestrated by n8n.

3. Q&A / Agent layer

For user queries (via WhatsApp / Telegram / web app):

Trigger: Webhook node (or e.g. Telegram Trigger, WhatsApp via an API)
Code / HTTP Request node → call Supabase RPC to:
- Run hybrid vector search
- Expand by relations (collect all connected amendments)
OpenAI node →
- Give it the question + all relevant articles + amendments
- Ask for a grounded legal answer + list of which articles it used
Webhook Reply / messaging node → send answer back to the user.

Mini example (simplified)

User question:

“What does the law say about extending a contract beyond its original duration under the procurement regulation?”

Retrieval bundle (simplified):

Article 5 – base rule on contract duration
Article 9 – amendment: “Article 5 is amended to allow extensions up to X under conditions Y and Z”
Article 14 – clarification on exceptional cases

LLM answer (short version):

The original regulation limited contract duration to X months with no explicit extension mechanism (Article 5).

However, this was amended by Article 9, which allows extensions up to Y months, provided that:
– the extension is justified by [conditions]
– approved by [authority]
– does not exceed a total duration of [limit]

Article 14 further clarifies that in exceptional cases (e.g. force majeure), the supervising authority can approve additional extensions subject to [constraints].

Applied articles: 5 (as amended), 9, 14.

This is the kind of “time-aware” reasoning that plain RAG was failing at.

What I’d love feedback on

n8n patterns for knowledge graphs
- Anyone here tried modeling (and keeping in sync) nodes + relations through n8n in a more standardized way?
Scaling pgvector vs external vector DB
- Right now I’m on pgvector in Supabase. Curious if others in the community moved a similar use case to Qdrant/Weaviate/etc. and why.
Best practices for legal / high-risk domains
- Guardrails, logging, human-in-the-loop: any n8n patterns you like for “sensitive” AI agents (like legal / compliance / finance)?

Happy to share more details on the Supabase schema, n8n workflow screenshots, or the prompt structures if anyone’s interested