AI Inquiry Classifier + Auto CRM — webhook → Groq AI → Google Sheets (free APIs)

Built this for service businesses that get inquiries via WhatsApp, web forms, or any channel.

How it works:

  1. Customer sends a message to a webhook
  2. Groq AI classifies it: category (appointment / quote / inquiry), urgency, one-line summary, and draft reply
  3. Row appended to Google Sheets automatically
  4. Webhook returns the classification + reply in under 2 seconds

Zero monthly cost — runs on Groq free tier (14,400 req/day) and Google Sheets via Service Account.

Tested with dental clinics, law firms, real estate agents.

The workflow JSON + full setup guide (Groq key, Service Account, spreadsheet, test curl) are available here:

2 Likes

Nice work @Diego_Ostro — the vertical focus (dental/legal/real estate) is the right call. Generic chatbot builders drown in edge cases; vertical-specific classification is way more accurate.

@Benjamin_Behrens raises a great point about ambiguous messages. We run a WhatsApp AI agent for a dental clinic in production (~500+ conversations/month) and hit exactly this. Our approach: instead of a rigid classifier with fixed categories, we use an AI Agent node with a detailed system prompt that handles triage conversationally. The AI collects what it needs (name, procedure, urgency) across multiple messages, then decides the routing.

For the overlap problem (“I need to reschedule AND get a quote”), the agent handles it naturally in one conversation thread rather than trying to force a single category. When it hits something it can’t resolve (emergencies, complex insurance questions), it includes a {handoff} tag in its output — a downstream IF node catches that and routes to the human team with full conversation context.

A few production patterns that might help anyone building this:

Message buffering for WhatsApp: People send 3-4 messages in a row before the bot can reply. We INSERT each message into a Postgres buffer table, WAIT 10 seconds, then SELECT + aggregate all messages for that phone number before sending to the AI. Without this, you get fragmented replies to each partial message.

Race condition fix: If two webhook executions fire for the same phone number simultaneously, use SELECT … FOR UPDATE SKIP LOCKED on the buffer query. The second execution gets zero rows and exits cleanly — no duplicate AI responses.

Urgency fallback: Don’t rely only on the AI to flag urgency. Add a structural regex check on the raw input (keywords like “emergency”, “bleeding”, “severe pain”) as an OR condition alongside the AI’s judgment. If the AI misses it, the regex catches it deterministically.

The classification + Google Sheets approach is solid for the first layer. The next evolution is usually adding state management (Postgres table tracking conversation stage, human mode flag, follow-up scheduling) — that’s where it goes from a classifier to a full agent.

1 Like

Right now it picks one — the prompt forces Groq to return a single category, so for overlap cases it routes by dominant intent. “Reschedule + quote” would log as reschedule and the quote request gets dropped.

The fix I have in mind for v2 is letting the model return a JSON array of categories, then splitting the workflow into parallel branches. Single-intent messages are the 90% case in the verticals I’m targeting, so I shipped it that way first, but you’re right that production edge cases eat you alive if you ignore it.

Are you running into this with a live setup or just scoping it out?

Thanks @ZapPro_Templates !

This is genuinely useful — thanks for writing it out.

The conversational agent approach makes sense for WhatsApp specifically. The classifier I built assumes one message = one intent = one routing decision, which holds for web forms , but falls apart fast when someone’s typing across 4 messages on their phone.

The message buffering pattern is something I hadn’t seen documented cleanly before. The 10-second WAIT + aggregate step is obvious in hindsight but I can see how easy it is to skip it and end up with fragmented replies. And the SELECT FOR UPDATE SKIP LOCKED for race conditions it´s a great implementation!

A question on the {handoff} tag approach: do you find the AI reliably self-identifies when it’s out of its depth, or do you also have a turn-count fallback (e.g., after N exchanges without resolution, force escalate)?

The state management layer you’re describing is clearly where this needs to go for real clinic deployments. Right now this template is intentionally shallow — meant as a starting point for people who need classification before they’re ready to manage conversation state. The Postgres table + human mode flag architecture would be a solid v2.

Appreciate you sharing the production detail!

Great questions, Diego.

On {handoff} reliability — we don’t rely on the AI alone. The HANDOFF_CHECK node uses an OR condition: it checks if the AI output contains {handoff} OR if the patient’s input matches a regex of urgency keywords (severe pain, bleeding, emergency, etc.). So even if the AI misses the tag, the system escalates deterministically. In production we’ve seen the AI include {handoff} correctly ~95% of the time with Claude 3.5 Haiku — the structural fallback catches the rest.

Turn-count fallback — honest answer: we don’t have one yet. Right now if the AI loops without resolution, it just keeps going. Your suggestion is solid and it’s now on our roadmap. Thinking something like: after 6-8 exchanges without {handoff} or a clear resolution, force escalate. Simple counter in Postgres, increment per exchange per session, reset on handoff or new session.

On the state management layer — fully agree that the template is intentionally shallow. Our production setup (a dental clinic running since March) has the full stack: wa_human_mode table for blocking the bot during human conversations (24h TTL, auto-expire), wa_msg_buffer for the aggregation pattern, wa_followup for D+1/D+3 follow-up sequences, and wa_contact_usage for per-client message counting. The Postgres + human mode flag architecture is exactly what we’d call v2 — and it works well at single-clinic scale.

To answer your earlier question — yes, live setup. One clinic active since late March, second client onboarding now. The patterns I shared all come from production bugs we hit and fixed. Curious what classification patterns you’re building on your end.

On what I’m building, I’m at the other end of the spectrum from you. This template is intentionally minimal: single-message classification, no state, no conversation history. It’s aimed at small businesses that have a basic contact form generating leads they’re not routing anywhere.

The Google Sheets output gives them visibility they don’t have today without requiring Postgres or conversation management.

The next layer I’m thinking about is exactly what you’re describing — but as a separate, more advanced template. Your production schema (buffer, human mode flag, follow-up sequences) is a solid reference point for what that looks like in practice.

Glad my suggestion was useful!

What’s the onboarding flow for the second clinic — same WhatsApp number structure or did you have to adapt the workflow?

1 Like

Same WhatsApp number structure — the webhook and routing logic transferred cleanly. The main adaptation was on the workflow side: the second clinic had a different triage logic (they route by specialty, not just urgency), so I had to rework the classification prompt and add an extra branch before it hits the buffer. The human mode flag and follow-up sequences stayed identical though — that part is fully reusable. Honestly the second onboarding was about 30% of the effort of the first one once the base structure was solid.

Makes sense. The classification node is always the part that doesn’t transfer. How are you storing the specialty taxonomy? Hardcoded in the prompt or pulling from a sheet/db so the clinic can edit it themselves?