Hi everyone,
I’m experimenting with a local-first voice AI agent using n8n.
The current setup uses:
-
Python script for audio input/output
-
local STT and TTS services
-
n8n for the workflow
-
PostgreSQL for structured memory
-
pgvector for semantic long-term memory
-
LM Studio for local LLM inference
The loop is already working end-to-end:
voice input → STT → n8n → PostgreSQL/pgvector memory → local LLM → TTS → voice output
It still feels more like a voice-enabled chat than a natural real-time conversation, so I’m now working on latency, memory retrieval quality, context filtering, and observability.
I’ve documented the prototype here:
It is still a WorkInProgress, but I’d love feedback from the community, especially around workflow structure, hybrid memory patterns, pgvector retrieval. There is so much to improve and learn! ![]()
Thanks!