OFFICIAL ARCHITECTURE STUDY: DAEMON V4 (Stateful Distributed Cognitive System via Chat Hub)

Hey n8n community! :waving_hand:

After a 24-hour intense iterative engineering sprint, battling memory limits, and diving deep into self-hosted .env optimizations, I want to share a comprehensive architecture study on building a Stateful Distributed Cognitive System completely inside self-hosted n8n v2.

I call this project Daemon V4—a personal Proto-Cognitive AI Companion. My objective was to migrate from a highly deterministic but token-heavy “Flat-Injection” architecture to a scalable “Distributed Multi-Model Hybrid RAG” system, without losing the absolute strictness of the AI’s system governance.

Here is the breakdown of the architecture, the bottlenecks we hit, and how we solved them natively in n8n.


:brain: PHASE 1: COGNITIVE FOUNDATION & STRICT GOVERNANCE

Before upgrading the memory infrastructure, the Agent’s foundational reasoning was surgically constrained to eliminate “AI Sycophancy” (hallucinated compliance) and rule-drift.

  • The Verificator Patch: Implemented strict rule extraction. If a user suggests a rule change, the Agent is hard-coded to output a specific PENDING_APPROVAL template. The Agent cannot alter its own behavior or claim “internalization” until the human Architect manually sets the database status to APPROVED.
  • Hierarchy of Truth (Rule 36): A hard-coded priority matrix injected into Layer 0. Priority 1 (System Rules) ALWAYS overrides Priority 5 (Retrieved PGVector Data). This completely immunizes the Agent against “Semantic Drift” where retrieved RAG documents might contradict core directives.

:file_cabinet: PHASE 2: HYBRID RAG ARCHITECTURE (THE 3-LAYER SEPARATION)

The legacy system injected 5 compartments of raw database text into the System Prompt. V4 eliminates this token bloat via a Hybrid approach:

  • Layer 0 (The Vault - Deterministic): An SQL Node fetches [SYSTEM_RULES] and a JSON-compressed [OPEN_LOOPS] state, injecting them directly into the System Message. Guarantees 100% governance compliance.
  • Layer 1 & 2 (Conversational Context): Utilizes n8n’s native Window Buffer Memory bound to Postgres Chat Memory.
  • Layer 3 (Semantic Knowledge): Massive project blueprints and external web research are stored in PGVector, accessed ONLY via an Agent Tool (Retrieve as Tool mode).

:robot: PHASE 3: DISTRIBUTED MULTI-MODEL ORCHESTRATION

To prevent the 15x token bloat typical of native Multi-Agent setups, V4 utilizes a Single-Agent Orchestrator with Async Workers.

  • Parent Workflow (Gemini Flash): Handles the Chat Hub UI. Extremely fast. Evaluates if background work is needed.
  • Child 1 & 3 (Ingestion): Gemini models that scrape the web or process documents, generate embeddings, and insert into PGVector.
  • Child 2 (Rule Verificator): Gemini model parsing meta-conversations for system rule updates.
  • Child 4 (The Ollama Sweeper): A zero-API-cost local LLM. Triggered every N interactions to compress the Postgres Chat Memory and update the Layer 0 [OPEN_LOOPS] JSON state, preventing memory OOM and token bloat.

:hammer_and_wrench: PHASE 4: INFRASTRUCTURE, SECURITY & RESOLVING N8N BOTTLENECKS

Through cross-validation with n8n docs and devops audits, we solved these critical bottlenecks:

A. The Sub-Workflow Async Bug → LLM-Driven Webhook Routing

  • Problem: n8n’s Execute Sub-workflow node can sometimes hang on fire-and-forget tasks.
  • Solution: The Parent Agent uses a Structured Output Parser to cleanly output JSON: {"chat_response": "...", "trigger_child": "webhook_url"}. An Edit Fields node separates the conversational text to the Chat Hub UI, while sending an HTTP Request to fire the Child Workflow’s Webhook asynchronously.

B. Concurrency & OOM Prevention (DevOps Hardening)

  • Problem: Simultaneous Async Webhooks will crash the default Postgres connection pool and bloat the execution database.
  • Solution: Server .env explicitly hardened:
    • DB_POSTGRESDB_POOL_SIZE=10
    • N8N_CONCURRENCY_PRODUCTION_LIMIT=5 (Works beautifully because we use Webhooks, not native sub-workflows).
    • EXECUTIONS_TIMEOUT=60
    • EXECUTIONS_DATA_SAVE_ON_SUCCESS=none (Crucial for background workers).

C. Internal Webhook Security (SSRF Loopback)

  • Problem: n8n v2 SSRF protection blocks localhost HTTP requests (ECONNREFUSED).
  • Solution: Routing via Docker internal service names (e.g., http://n8n:5678), adding it to N8N_SSRF_ALLOWED_HOSTNAMES, and enforcing an explicit Authorization: Bearer <secret> header on all Child Webhooks to prevent internal prompt-injection abuse.

D. Vector Deduplication (Manual Upsert)

  • Problem: PGVector in n8n lacks native Upsert or pre-check, leading to duplicate embeddings.
  • Solution: Child Ingestion workflows utilize a manual pre-check: Get Many (Limit: 5, Threshold: 0.88) → IF Node (Is Empty) → Insert. If any existing vector > 0.88 similarity is found, ingestion is aborted.

E. Async Cascade Effect (Interaction Consistency)

  • Problem: Background children modifying the Layer 0 state mid-conversation could cause the Parent to experience “logic jumps”.
  • Solution: Implementing a “State Snapshot per Interaction”. The Parent locks the Layer 0 SQL data at the exact millisecond the Chat Trigger fires. Any updates from the Ollama Sweeper or RAG Ingestion will only apply to the next Chat Trigger sequence.

:bullseye: CONCLUSION

The Daemon V4 architecture transcends standard chatbot logic. By combining deterministic prompt injection, semantic vector retrieval, strict Header Auth security, local LLM memory compression, and explicit DB pool concurrency controls, it represents a highly scalable, enterprise-grade Stateful Distributed Cognitive System built entirely on n8n v2.

I’d love to hear your thoughts or answer any questions if anyone is building something similar! Let me know if you want me to share specific JSON node snippets for the Webhook Routing or Vector Deduplication logic.

Happy automating! :rocket: