Show and Tell: ZapPro — WhatsApp AI Agent Template for n8n (triage, handoff, follow-up)

Hey everyone! Sharing something I’ve been running in production for a few months — a WhatsApp AI agent built entirely in n8n.

What it does:

  • Receives WhatsApp messages via webhook and buffers them (10s window) to handle rapid-fire messages as one unit
  • Runs an AI agent (Claude 3.5 Haiku via OpenRouter) with a business-specific system prompt
  • Detects urgency via keyword regex OR AI tagging {handoff} — triggers team notification with full conversation context
  • Human takeover mode: team types “unlock” to re-enable the bot after handling
  • Follow-up sequences: D+1 and D+3 automatic follow-ups via cron (suppressed if human mode active, opted out, or booking confirmed)
  • Automatic fallback: if OpenRouter goes down, switches to Anthropic direct API seamlessly
  • Google Calendar integration for real-time appointment scheduling

Architecture highlights:

  • Message buffer in PostgreSQL (INSERT → wait 10s → SELECT → DELETE → AI processes aggregated text)
  • FOR UPDATE SKIP LOCKED prevents duplicate AI responses on concurrent webhooks
  • Error monitoring workflow catches DNS/connection failures and alerts via WhatsApp
  • All credentials via n8n credential manager — no hardcoded keys

Demo video — dental clinic use case (our pilot client), but the template works for any service business.

I packaged this as a ready-to-import n8n template with full setup guide. Available at zapproai.com — Core ($297) and Pro ($497, adds scheduling + follow-up sequences).

Happy to answer questions about the architecture!

Good catch — that’s actually the weakest point in the current setup, being honest.

Right now: no TTL, no dead-letter. If n8n crashes between INSERT and DELETE, rows sit in wa_msg_buffer indefinitely. The mitigation is a monitoring workflow (separate cron) that scans for messages older than 15 minutes and fires a WhatsApp alert to the team. Cleanup is then manual via a quick DELETE query.

It works fine at 1-4 client scale but I documented it as a known gap for when you scale past ~5 concurrent clients. The proper fix is a created_at column with an automated cleanup cron: delete anything older than 2 minutes that wasn’t processed (implying the webhook died mid-window). A dead-letter table would be cleaner but adds complexity that doesn’t justify itself until you have meaningful volume.

The FOR UPDATE SKIP LOCKED pattern actually helps here too — if the worker crashes after SELECT but before DELETE, the row gets unlocked on session end and the next execution picks it up. Postgres handles that part cleanly.

Update: adding an automated cleanup cron to the monitoring workflow this week — will post the fix here when it’s live.

Hey Benjamin!

We implemented the stuck message detection and auto-cleanup feature you asked about.

The Monitor Central workflow now includes a parallel branch that runs every 30 minutes:

  • CHECK_STUCK_BUFFER — queries wa_msg_buffer for messages older than 5 minutes that weren’t processed

  • Auto-cleanup — uses FOR UPDATE SKIP LOCKED (CTE) to safely delete stuck rows without interfering with active executions

  • Alert — sends a WhatsApp notification with the count and affected chat IDs whenever stuck messages are found and cleaned

This is already included in the Pro template (ZapPro_Monitor_Central.json) — 15 nodes, all in English with placeholder configs.

The workflow now has 14 nodes total (up from 7) and handles both error monitoring AND buffer health in a single cron cycle. No Code node used — all native n8n nodes (Set, IF, Postgres, HTTP Request) to avoid VM2 sandbox timeout issues on some hosting providers.

Let me know if you have any questions on the setup!