How I solved persistent memory for AI agents in n8n — dual-layer Postgres + Supabase pgvector pattern (Openclaw in n8n)

Hey everyone,

After months of experimenting with AI agents in n8n, the hardest problem I kept hitting was memory — specifically making agents feel continuous across sessions without destroying performance or bloating every prompt.

Wanted to share the pattern I finally landed on that actually works in production.

The Problem with Single-Layer Memory

Most n8n AI agent tutorials use either:

  • The built-in memory buffer window (dies when session ends)

  • A Google Sheet or Airtable as memory (slow, no semantic search, doesn’t scale)

  • Just dumping everything into context (expensive, hits token limits fast)

None of these work well once your agent has weeks of history and needs to recall something specific from 3 conversations ago.

The Two-Layer Solution

Layer 1 — Postgres (Short-Term)

Use n8n’s native memoryPostgresChat node. Set a custom session key (I use username so each user gets their own history). Context window of 15 messages.

This loads instantly on every session. The agent always has the last 15 turns available without any extra calls. It handles “what did I just ask you?” perfectly.

Layer 2 — Supabase pgvector (Long-Term)

This is where it gets interesting. You need to:

  1. Set up a documents table in Supabase with the pgvector extension enabled

  2. Create a match_documents similarity search function

  3. Build a scheduled n8n pipeline that runs nightly

The nightly pipeline works like this:

Schedule trigger
→ Get last_vector_id from your init table
→ Query Postgres: SELECT * FROM n8n_chat_histories WHERE id > last_vector_id
→ Aggregate all messages into one block
→ LLM summarizer (I use Claude Haiku — cheap and fast)
→ Structured output parser (extracts summary field)
→ OpenAI embeddings node
→ Insert into Supabase vector store
→ Update last_vector_id so next run only processes new messages

Then on the agent side, expose the Supabase vector store as a retrieval tool — not a memory loader. The agent calls it when it needs to recall something specific. This way you’re not stuffing old embeddings into every single prompt unnecessarily.

Why This Works

Short-term (Postgres): handles in-session continuity, fast, no API calls Long-term (Supabase): handles cross-session recall, semantic search by meaning not keywords

The agent can answer “what did we decide about X three weeks ago?” because it searches by meaning across all past conversations — not just keyword matching in a database.

The Supabase SQL you need:

sql

create extension if not exists vector;

create table documents (
  id bigserial primary key,
  content text,
  metadata jsonb,
  embedding vector(1536)
);

create function match_documents (
  query_embedding vector(1536),
  match_count int default null,
  filter jsonb default '{}'
) returns table (
  id bigint,
  content text,
  metadata jsonb,
  similarity float
)
language plpgsql
as $$
begin
  return query
  select
    id, content, metadata,
    1 - (documents.embedding <=> query_embedding) as similarity
  from documents
  where metadata @> filter
  order by documents.embedding <=> query_embedding
  limit match_count;
end;
$$;

One gotcha:

The last_vector_id tracking is critical. If your init table lookup fails and returns null, your pipeline will try to embed the entire history every single run. Always set a fallback to 0 and add error handling around the SQL query node.

Hope this helps someone — spent a long time getting this right.

I used this pattern inside a larger workflow I built (AI agent + lead gen + outreach + Gmail + Docs). If anyone’s curious about the full system I packaged it up and it’s available — check out here https://n8nclaw.vercel.app

Really useful writeup — the dual-layer approach is exactly right and closely matches what we ended up with in production too. The last_vector_id tracking gotcha you mentioned bit us the first time we forgot the fallback to 0. One thing we added to the nightly pipeline is a lightweight importance filter before the embedding step — the summarization LLM returns a simple score alongside the summary, and conversations below a threshold (small talk, one-word confirmations) get skipped entirely. Keeps the vector store clean and retrieval quality stays high even after months of accumulated data.

Great architecture. The two-layer approach — short-term buffer with a message window + nightly summarization to vectors — maps closely to how production AI coding tools handle this at scale.

The Claude Code source code leaked yesterday, and their compaction engine uses a similar philosophy but with three tiers: deterministic tool result clearing before every API call (no model involved, just keeps the 5 most recent), server-side token threshold cleanup, and full LLM summarization as last resort.

The most interesting design choice was around cache economics. When the prompt cache is warm, they don’t modify messages at all — instead they queue `cache_edits` that tell the server to delete specific tool results by ID, surgically, without touching the cached prefix. Warm cache = 90% token discount, so preserving it drives the entire architecture.

Your decision to expose the vector store as a retrieval tool rather than a memory loader follows the same principle — don’t stuff old context into every prompt. Only pull what’s relevant when the agent needs it.

Full breakdown of the 3-tier system here: Claude Code's Compaction Engine: What the Source Code Actually Reveals