Implementing Anthropic-style Agent Skills in n8n: a deep technical walkthrough with Postgres + progressive disclosure

:warning: Work in progress / testing in progress. This is a live write-up of a pattern I’m actively building and stress-testing on a self-hosted n8n instance. Schema, configuration details, recommended thresholds, and limits sections will likely change as I run into real-world edge cases during testing. Treat this as a working blueprint open to iteration, not a finalised guide. I’ll keep editing this post as the tests progress — feedback, pushback, and “you’re wrong about X” comments are very welcome.

This is a long one. Skip to section 7 for the full workflow blueprint, or section 9 if you only care about the limits.

TL;DR

I built a pattern inside n8n self-hosted that lets a single AI Agent node behave like a specialist across dozens of distinct tasks, without ballooning the system prompt. The trick is borrowing Anthropic’s Agent Skills pattern and implementing progressive disclosure entirely with native n8n nodes (AI Agent, Call n8n Workflow Tool, Execute Workflow Trigger, Postgres). No custom code beyond a small Code node.


1. The problem I was solving

I had an AI Agent node powering an internal assistant that needs to handle: writing meta descriptions in a specific brand voice, generating PDFs from templates, drafting emails, doing competitor research, etc. Each of those tasks has its own rules — voice, format, edge cases, examples.

I tried the obvious thing first: cram everything into the system prompt. The system message grew past 8000 tokens. Every single turn paid that cost, even when the user just asked “what’s the weather”. Latency went up, costs went up, and the model started getting confused by irrelevant instructions bleeding into unrelated tasks.

I needed a way to load the right context only when needed.

2. The pattern: progressive disclosure

Anthropic recently published their Agent Skills spec. The core insight is progressive disclosure: split context loading into three layers.

Layer What’s loaded Cost When
1. Manifest name + description only ~30-50 tokens per skill Every turn
2. Skill body Full instructions (a SKILL.md equivalent) ~500-2000 tokens When the LLM decides this skill matches
3. References Examples, edge cases, long lists 500-5000+ tokens Only when the loaded skill explicitly needs them

The agent’s system prompt only carries layer 1. Layers 2 and 3 enter context as tool call results, which means they’re scoped to the current run and don’t pollute future turns unless the agent re-fetches them.

3. Why I chose Postgres (and not Data Tables)

n8n has had Data Tables since around v1.50 — native, simple, no extra infra. I considered them seriously. Here’s what they offer, what their documented limits are, and why I still went with a separate Postgres container for this specific use case.

What Data Tables offer

  • Zero extra infrastructure

  • Visual editing right in the n8n UI

  • Native CSV import and export from the UI

  • Project-scoped access (Personal-space tables are owner-only)

  • A public REST endpoint, /datatables, for programmatic access from outside n8n

  • Great for prototyping or for catalogs of a few dozen entries

Documented limits to weigh

These are the constraints I could verify in the official docs at the time of writing:

  • Column types: Boolean, Date, Number, String. No JSON, no VECTOR, no BLOB. So no embeddings on the same table, no arbitrary structured payloads.

  • Filter set: Equals, Not Equals, Greater/Less Than (with or without equal), Is Empty, Is Not Empty. No LIKE, no ILIKE, no full-text matching.

  • Operations: Insert, Update, Upsert, Delete, Get on rows; Create, Delete, List, Update on tables. No raw SQL.

  • Storage cap: by default the total storage used by all Data Tables in an instance is limited to 50MB. On self-hosted instances you can raise it via the N8N_DATA_TABLES_MAX_SIZE_BYTES environment variable. Verbatim from the docs: “Exceeding this limit will disable manual additions to tables and cause workflow execution errors during attempts to insert or update data.”

  • No Code node access: verbatim from the docs, “Direct programmatic access to data tables from a Code node isn’t supported. You can’t access data table values via built-in methods or variables.” This matters for the architecture I describe in section 7, where a Code node builds the manifest. On Data Tables you’d swap that for a Set node after a Get-many — workable, but a different shape.

Things the docs don’t explicitly cover

To be honest, I couldn’t find authoritative documentation on:

  • Native versioning, history, audit log, or change tracking

  • Concurrency / lock behaviour under high write contention

  • Whether n8n backups include Data Tables payloads (worth verifying for your environment)

If anyone has links I missed on these points, I’d love to update this section.

Why Postgres still wins for this use case

For me, on this specific workload, Postgres still wins. Concretely:

  1. The 50MB storage cap is a soft ceiling I’d rather not have to monitor for a system that may grow into thousands of skill versions in a history table.

  2. I want server-side similarity search via pgvector once the catalog grows past ~100 skills, not just LLM-side reasoning over a long manifest.

  3. I want my Code nodes free to read the data directly, including for export jobs and admin tooling.

  4. I want raw SQL for the day I need an aggregation, a JOIN, or a custom history trigger.

  5. The history-trigger pattern is a ten-line PL/pgSQL function in Postgres; on Data Tables it has to be reproduced in workflows.

If the goal is a small, fixed catalog (under ~30 skills) without growth pressure, Data Tables are a perfectly defensible choice. For a long-lived production system that I want to extend, I take the extra container.

4. Architecture overview

Three workflows plus one Postgres container.

+------------------------+
|   wwu-skills-db        |   pgvector image, separate container
|   namespace: skills    |   same docker network as n8n
+------------------------+
            ^
            |
    +-------+--------+----------+
    |       |        |          |
+-----+ +-------+ +-------+ +-------+
| MAIN| | LOAD  | | LOAD  | | (admin|
| AGT | | SKILL | | REF   | |  ops) |
+-----+ +-------+ +-------+ +-------+

  • MAIN_AGENT: chat-facing workflow with the AI Agent node.

  • tool_load_skill: sub-workflow exposed as a tool. Takes skill_name, returns the skill’s full content.

  • tool_load_reference: sub-workflow exposed as a tool. Takes skill_name + reference_name, returns the reference content.

5. The Postgres schema (high level)

The schema lives in a dedicated namespace (call it skills_mgmt). It has three tables.

  • skills is the primary store. Its primary key is the name (a short slug like wwu-meta-description). Other columns: description for the manifest, the full content, an active boolean flag, a monotonic version integer, and the usual created_at / updated_at timestamps. A regex check on name enforces a safe identifier format (lowercase, digits, hyphens) at DB level — belt and braces against accidental writes that bypass the workflow. There’s also an embedding column reserved for future similarity search, nullable, dimensioned to whatever model you’ll use down the road.

  • skill_references is the child table for the on-demand reference files. It’s linked to skills with cascade-on-delete behaviour: removing a skill drops its references too. Uniqueness is enforced on the pair (skill_name, reference_name), and a regex check validates reference_name (lowercase, digits, dots, hyphens, underscores).

  • skills_history is the archival table. A BIGSERIAL primary key plus a copy of the relevant fields and an archived_at timestamp. Filled automatically by a trigger on the skills table.

The history mechanism is a single stored function attached to a before-modification trigger. On every change to a skill, the function copies the previous row into skills_history, increments the version counter, and refreshes the timestamp. Net effect: the history table is your audit log, never empty, never lossy, requires zero workflow-side bookkeeping.

One partial index on active keeps manifest reads fast even after hundreds of soft-deleted skills accumulate.

The regex checks on name and reference_name aren’t strictly necessary because we use parameterised queries, but they’re a useful belt-and-braces measure: even if someone bypasses the workflow and inserts directly, you can’t smuggle weird identifiers in.

Full DDL (table definitions, trigger function, indexes) is in the Gist linked at the bottom of this post.

6. Container setup (high level)

A separate Postgres container, on the same internal Docker network as the n8n service, with no port published to the host. Only n8n reaches it, never reachable from outside.

Image of choice: pgvector/pgvector:pg16 rather than vanilla Postgres. It’s plain Postgres 16 with the pgvector extension preinstalled; identical performance to vanilla if you don’t enable the extension. The point is forward compatibility — the day you want similarity search the migration cost is zero.

Persistent named volume for the data directory, password from environment variable, restart policy unless-stopped. Database wwu_skills, dedicated user with privileges only on the skills_mgmt namespace. In n8n, create a Postgres credential pointing at the container by hostname (wwu-skills-db) over port 5432.

Full docker-compose.yml snippet is in the Gist.

7. The three workflows

7.1 tool_load_skill

Three nodes:

  • Execute Workflow Trigger (v1.1) declaring a single skill_name string input. This is the contract the LLM sees.

  • Postgres (v2.6) in executeQuery mode, with a parameterised lookup against the skills table filtering by name and the active flag, returning at most one row. The parameter is passed through options.queryReplacement as an n8n expression — this separates the value from the SQL string and prevents injection (n8n sanitises the value before it reaches the driver, per the official docs).

  • Set / If to normalise the response: either { skill_name, content } on hit, or { error: "Skill not found" } on miss. The agent understands both shapes.

7.2 tool_load_reference

Same three-node shape as 7.1. Two string inputs (skill_name, reference_name), a two-parameter lookup against the skill_references table, normalised output. The queryReplacement option becomes a comma-separated list of two n8n expressions that map onto the two parameters of the query.

7.3 MAIN_AGENT

Four nodes in sequence, plus the AI Agent’s sub-nodes:

  1. Chat Trigger (v1.4) — entry point. A Webhook works equally well.

  2. Postgres in select operation mode, against the skills table in the skills_mgmt namespace. Filter active = true, output columns reduced to name and description only (no content, that would balloon the manifest), returnAll: true. The output is the active manifest.

  3. Code node — a small JavaScript snippet, six lines, that maps the incoming items to a flat array of {name, description} pairs and serialises it with JSON.stringify. Output: one item carrying a single skills_index string field. This is the only piece of custom code in the entire system.

  4. AI Agent (v3.1, Tools Agent) — connected to your chat model, your memory store of choice (Postgres Chat Memory or Buffer Window), and the two Tool Workflows. The system message instructs the model to scan the manifest for matching skills, call load_skill before producing output, and never invent skill names. The manifest is injected at the bottom of the system message via an n8n expression that references the Build Index Code node’s output.

7.4 Tool Workflow nodes inside MAIN_AGENT

Configure each Tool Workflow with a clear description — the LLM reads this to decide whether to call the tool. Sloppy descriptions cause sloppy routing.

The descriptions I use are roughly: load_skill loads full instructions for a specific skill before executing the task, and must always be called when the user request matches a skill listed in the manifest; the parameter skill_name must exactly match an entry in the manifest. load_reference loads an additional reference file for an already-loaded skill, and must be called only when the loaded skill explicitly references the file and the current task actually needs that detail; parameters are skill_name and reference_name (a filename like examples.md).

Full descriptions, verbatim, are in the Gist.

For the workflowInputs of each Tool Workflow, click the AI button on each parameter so the LLM populates them via $fromAI(). Don’t hardcode values.

8. End-to-end execution flow

User: "write me the meta description for the email marketing page"

[Chat Trigger]
   |
[Postgres lookup, active skills only]    -> 12 rows
   |
[Code: build index]                      -> skills_index JSON string
   |
[AI Agent]                               system prompt with manifest baked in
   |
   +-- LLM matches "wwu-meta-description" by description
   +-- calls load_skill(skill_name="wwu-meta-description")
   |     |
   |     +-- sub-workflow runs the parameterised lookup
   |     +-- returns content (~500 tokens)
   |
   +-- LLM reads instructions, decides reference is not needed
   |
   +-- writes the meta description following the loaded rules

System prompt cost stays roughly constant regardless of how many skills exist (the manifest is short by design). Loaded skill content only enters context when actually used.

9. Limits, sharp edges, and what I’d watch out for

  • Sub-node expression scoping. n8n docs explicitly state that sub-nodes resolve expressions to the first item only. If you ever pass an array of items into a Tool Workflow, the LLM will see only the first. Plan accordingly: tools should be designed for single-item input/output.

  • No documented hard limits on Tool Workflow payload size. In practice keep skill content under ~50KB. Above that, split into references.

  • System message size. The AI Agent has no documented cap, but the underlying chat model does. With 200 skills × 80 chars description, you’re at ~16KB just for the manifest — still fine, but plan for it.

  • Streaming + tool calls. With enableStreaming: true (the default in v3.1), tool calls still work but UI streaming behaviour varies by chat trigger setup. Test carefully if UX matters.

  • Memory pollution. If your conversation memory is a long buffer window, the result of past load_skill calls stays in memory and gets resent every turn. Either use shorter memory windows or summary memory, or strip tool messages from memory when relevant.

  • Concurrent edits. Editing skills in Postgres while a workflow run is mid-flight is safe — the run already pulled its manifest. But be aware that the manifest is fetched per run, not per turn, so during a long session a hot-reloaded skill won’t be visible until the next session.

  • Description quality matters more than skill body quality. The LLM’s routing is entirely driven by descriptions. Spend time on them. Bad description = right skill never gets called.

10. What this unlocks

  • One AI Agent that scales across 50+ specialist tasks without prompt bloat.

  • Editing a skill = updating one row, no workflow redeploy.

  • A clear path to RAG: drop in pgvector, embed descriptions, replace the static manifest with a search_skills(query) tool. Same architecture, just a smarter index.

  • Easy export to markdown for backup or Git versioning — a separate Cron-triggered workflow can dump every skill as a SKILL.md file to S3 / Drive / GitHub.

11. What I’d love feedback on

  • Has anyone tried this with n8n Data Tables at meaningful scale? My pessimism in section 3 is informed but not from production experience at >100 skills.

  • Anyone implemented the search_skills RAG variant? Curious whether you saw routing quality improve over the manifest-in-prompt approach, or whether it just added latency.

  • The memory pollution issue (section 9, point 5) — what’s the cleanest pattern in n8n to strip past tool results from memory?

Happy to share the workflow JSON exports if there’s interest. Also happy to be told I’m overcomplicating this and Data Tables would have been fine.

12. Companion Gist (the actual code)

I deliberately kept this post code-light to keep it readable (and to dodge over-eager WAF rules that don’t love long blocks of DDL in forum posts). The full code lives in a public Gist:

:backhand_index_pointing_right: n8n + Postgres Agent Skills - companion code · GitHub

The Gist contains, in order:

  1. The full Postgres schema (table definitions, history trigger function, indexes, optional pgvector column).

  2. The docker-compose.yml snippet for the Postgres container.

  3. The exact SQL behind tool_load_skill and tool_load_reference, with the queryReplacement mapping.

  4. The Build Index Code node body.

  5. The full system message text for the AI Agent, with the manifest expression at the bottom.

  6. The two Tool Workflow descriptions verbatim.

Cheers

1 Like

Hey @mredodos , Hey everyone! :waving_hand:
I absolutely love the concept of applying Anthropic style progressive disclosure to n8n agent skills; it’s such a smart way to keep prompts lean and save on token costs.

I just wanted to drop my two cents on the database layer aspect of this, specifically when we’re comparing n8n’s tables to PostgreSQL for AI memory.

While the tutorial does a great job showing how to hook skills up to a database, it’s super important for folks building this out to remember that n8n tables and Postgres are completely different beasts under the hood.

n8n Tables: Think of these as a super convenient flat file abstraction (sort of like a folder DB). They are absolutely brilliant for rapid prototyping, quick state saving, or logging.

PostgreSQL: This is a heavyweight, full fledged Relational Database Management System (RDBMS).

When deciding where your AI agents will store and retrieve their context, you have to look beyond just “storing data” and weigh three things:
(1) Speed
(2) Structure
(3) Business Complexity.

If your agent needs to do complex relational lookups, an RDBMS like Postgres handles that natively in milliseconds. Forcing an n8n table to do that kind of highly concurrent, complex querying is a fast track to bottlenecking your flow.

If you’re scaling an agent’s memory for production, the speed, ACID compliance, and advanced features (like pgvector for AI embeddings) of Postgres make it a no-brainer.

Thanks again for the great write up!

2 Likes