Hi n8n community ![]()
First post here ![]()
I wanted to share a use case I recently worked on and originally wrote about on LinkedIn: an AI legal assistant that can answer questions about procurement laws including all their later amendments — not just the original law. ([LinkedIn](أعقد مشروع اشتغلت عليه حتى الآن! ملحوظة: البوست تقني شوية للمهتمين بمجال وكلاء الذكاء الاصطناعي في البيزنس من فترة كلمني مسئول من قطر، إنه عاوز يعمل تطبيق يساعد في استخلاص قوانين بشكل سريع من ملفات… | Karim Nabil | ١٧٦ من التعليقات "أعقد مشروع اشتغلت عليه حتى الآن!
ملحوظة: البوست تقني شوية للمهتمين بمجال وكلاء الذكاء الاصطناعي في البيزنس
من فترة كلمني مسئول من قطر، إنه عاوز يعمل تطبيق يساعد في استخلاص قوانين بشكل سريع من ملفات… | Karim Nabil | ١٧٦ من التعليقات"))
The domain: government tender / procurement laws, where you typically have:
-
A base law / regulation (e.g. from 2015)
-
Multiple amendments in 2019, 2024, etc.
-
Amendments that say things like:
“Article 9: The text of Article 5 is amended and a new clause is added…”
If you do naïve RAG here, you often retrieve the original article text but lose the amendments, so the answer is simply wrong.
The problem
My first attempt was the “obvious” one:
-
Parse PDFs
-
Chunk them
-
Embed + store in a vector DB (I’m using Supabase + pgvector) ([LinkedIn](أعقد مشروع اشتغلت عليه حتى الآن! ملحوظة: البوست تقني شوية للمهتمين بمجال وكلاء الذكاء الاصطناعي في البيزنس من فترة كلمني مسئول من قطر، إنه عاوز يعمل تطبيق يساعد في استخلاص قوانين بشكل سريع من ملفات… | Karim Nabil | ١٧٦ من التعليقات "أعقد مشروع اشتغلت عليه حتى الآن!
ملحوظة: البوست تقني شوية للمهتمين بمجال وكلاء الذكاء الاصطناعي في البيزنس
من فترة كلمني مسئول من قطر، إنه عاوز يعمل تطبيق يساعد في استخلاص قوانين بشكل سريع من ملفات… | Karim Nabil | ١٧٦ من التعليقات"))
-
Do semantic search + answer with an LLM
Result: wrong or incomplete answers whenever amendments were involved.
Why?
-
The original article contains the full context → has a high semantic score
-
Amendments are short and referential (“Article 9 amends Article 5…”)
-
A plain vector search happily retrieves the original article and ignores its later amendments
So we need something that understands relationships between articles, not just similar chunks.
High-level solution
I ended up with:
Semantic Chunking + Nodes & Relations → Custom Knowledge Graph → Relation-aware Retrieval
Roughly:
-
Semantic chunking (LLM-driven)
-
Instead of fixed-size chunks, an LLM reads each law file and splits it into articles (each article = one semantic chunk with all its clauses).
-
For each chunk, I generate rich metadata, including:
-
Article number
-
Law version / year
-
What other articles it refers to
-
Type of relation (e.g. amends, adds, cancels, clarifies, …)
-
-
-
Nodes & relations (knowledge graph)
-
Each article becomes a node.
-
Relations are stored like:
-
5 → 9(Article 9 amends Article 5) -
9 → 5(reverse edge)
-
-
Stored in Supabase (pgvector for embeddings + relational tables for edges).
-
-
Hybrid retrieval with relations
-
When a user asks a question, I:
-
Do hybrid search (semantic + keyword/filters) over articles.
-
Expand via relations: pull connected nodes (amendments, clarifications, etc.).
-
Give the LLM a bundle: original article + all its amendments.
-
-
The LLM can then answer with the current, consolidated law, not the outdated version.
-
This is the conceptual pipeline I described in the LinkedIn post. ([LinkedIn](أعقد مشروع اشتغلت عليه حتى الآن! ملحوظة: البوست تقني شوية للمهتمين بمجال وكلاء الذكاء الاصطناعي في البيزنس من فترة كلمني مسئول من قطر، إنه عاوز يعمل تطبيق يساعد في استخلاص قوانين بشكل سريع من ملفات… | Karim Nabil | ١٧٦ من التعليقات "أعقد مشروع اشتغلت عليه حتى الآن!
ملحوظة: البوست تقني شوية للمهتمين بمجال وكلاء الذكاء الاصطناعي في البيزنس
من فترة كلمني مسئول من قطر، إنه عاوز يعمل تطبيق يساعد في استخلاص قوانين بشكل سريع من ملفات… | Karim Nabil | ١٧٦ من التعليقات"))
Where n8n comes in
Even though the core semantics/graph logic sit in my own service + Supabase, n8n is the backbone that glues everything together:
1. Ingestion & parsing flow
-
Trigger: manual, cron, or “new PDF uploaded” event
-
Nodes:
-
HTTP Request→ send PDF to a parsing service (or a code-based microservice) -
Codenode → normalize the parsed structure, extract article metadata -
HTTP Request→ upsert parsed articles into Supabase tables
-
-
n8n handles batching, retries, and monitoring.
2. Semantic chunking + graph building
For each article row in Supabase:
-
Supabase/HTTP Requestnode → fetch a batch of raw articles -
OpenAI(or your LLM of choice) node → ask the model to:-
Clean up the article text
-
Identify related articles (e.g., “this article amends Article 5”)
-
Emit a JSON schema like:
{ "article_id": "5", "text": "...", "relations": [ {"target_article_id": "9", "type": "amends"} ] }
-
-
Codenode →-
Insert / update the article text & embedding (via Supabase RPC / HTTP)
-
Insert relation edges in a separate table
-
So you end up with a vector store + a mini knowledge graph, orchestrated by n8n.
3. Q&A / Agent layer
For user queries (via WhatsApp / Telegram / web app):
-
Trigger:
Webhooknode (or e.g.Telegram Trigger,WhatsAppvia an API) -
Code/HTTP Requestnode → call Supabase RPC to:-
Run hybrid vector search
-
Expand by relations (collect all connected amendments)
-
-
OpenAInode →-
Give it the question + all relevant articles + amendments
-
Ask for a grounded legal answer + list of which articles it used
-
-
Webhook Reply/ messaging node → send answer back to the user.
Mini example (simplified)
User question:
“What does the law say about extending a contract beyond its original duration under the procurement regulation?”
Retrieval bundle (simplified):
-
Article 5 – base rule on contract duration
-
Article 9 – amendment: “Article 5 is amended to allow extensions up to X under conditions Y and Z”
-
Article 14 – clarification on exceptional cases
LLM answer (short version):
The original regulation limited contract duration to X months with no explicit extension mechanism (Article 5).
However, this was amended by Article 9, which allows extensions up to Y months, provided that:
– the extension is justified by [conditions]
– approved by [authority]
– does not exceed a total duration of [limit]Article 14 further clarifies that in exceptional cases (e.g. force majeure), the supervising authority can approve additional extensions subject to [constraints].
Applied articles: 5 (as amended), 9, 14.
This is the kind of “time-aware” reasoning that plain RAG was failing at.
What I’d love feedback on
-
n8n patterns for knowledge graphs
- Anyone here tried modeling (and keeping in sync) nodes + relations through n8n in a more standardized way?
-
Scaling pgvector vs external vector DB
- Right now I’m on pgvector in Supabase. Curious if others in the community moved a similar use case to Qdrant/Weaviate/etc. and why.
-
Best practices for legal / high-risk domains
- Guardrails, logging, human-in-the-loop: any n8n patterns you like for “sensitive” AI agents (like legal / compliance / finance)?
Happy to share more details on the Supabase schema, n8n workflow screenshots, or the prompt structures if anyone’s interested ![]()