Help: Optimizing a WhatsApp Restaurant Agent (Gemini Flash) – Solving Hallucinations, Latency, and Double-Texting

Thoth_AI · February 15, 2026, 9:28am

MAIN WORKFLOW_SEVERIN PLUS.json (71.9 KB)
My restaurant bot (Severin Plus) is experiencing high latency and “double-text” errors. The current architecture is a linear synchronous flow:

WhatsApp Trigger → Duplicate Filter → HTTP Store Status Check → HTTP Typing Indicator → AI Agent (Gemini 1.5 Flash + 6 Tool sub-workflows) → Send Message.

Main Issues:

Webhook Retry Bug: Users often have to send a message twice. I believe the linear flow (multiple HTTP requests and tool calls) exceeds Meta’s 5-second webhook timeout, causing it to retry the message because a 200 OK wasn’t sent fast enough.
High Latency: Every sub-workflow tool adds execution overhead. The sequential nature makes the bot too slow for live service.
Concurrency/Hallucinations: The AI gets confused with simultaneous users. I suspect Simple Memory (5-message window) is failing to handle concurrent sessions reliably.

Request for Guidance: I want to move to a production-ready Worker/Queue pattern. Specifically, I need advice on:

Decoupling Response: How to immediately respond with a 200 OK and handle the AI logic asynchronously in the background.
Parallelization: How to run the status check and typing indicator simultaneously.
Persistent Memory: Best practices for moving from Simple Memory to Postgres/Redis for high concurrency.

Note: My workflow JSON was too large to paste; I have attached the file to this post. Any help would be appreciated!

Anshul_Namdev · February 15, 2026, 4:29pm

Hi @Thoth_AI Welcome!

You are right you, just make sure to not exceed Meta’s limits:

I have reviewed your workflow and please do not consider using MULTIPLE sub workflows connected to a single AI agent node, this would really help you decrease the overall latency of your workflow.

Again this links to how you leverage AI into your workflow, as i can see currently it is using AI but at really large scale and calling a lot of different flows which might not cause hallucination in initial runs but will cause it in production when it will be used extensively, and please consider using a proper database like supabase that is really a scalable solution.

Thoth_AI · February 15, 2026, 5:06pm

How do i ensure so many subworkflows are not connected to a single ai agent.

what strategy can i use to get the best efficiency

Anshul_Namdev · February 15, 2026, 5:14pm

@Thoth_AI Consider dividing your sub workflow across multiple AI agents so that each agent gets different type of sub workflows to deal with to insure proper diversion of tasks across AI agents, if possible reducing some sub workflows is also a good take, Also you can use this:

So that if you have some tasks where you just need another AI just use sub AI agent node. This approach would ensure proper diversion of tasks and also it will reduce overhead on one single AI agent which will reduce hallucination risks in production.

Pavel_Kuzko · March 10, 2026, 11:35pm

session ID is usually the real culprit, not the model. if sessionId isn’t scoped to the WhatsApp number, Simple Memory bleeds context across users. just use {{ $json.from }} as your sessionId and it fixes it straight away.

for double-texting — send the 200 OK back to the webhook immediately, before the agent even starts processing. Meta stops retrying as soon as it gets the 200.

ZapPro_Templates · April 12, 2026, 8:07pm

Three production patterns that address all three issues in the title:

1. Double-texting — deduplication is the missing piece

@Pavel_Kuzko is right that sending 200 OK immediately stops Meta’s 5-second timeout from retrying. But if your processing ever fails after the 200, the retry will carry the same message_id — and without a dedup layer you’ll process it twice.

The robust fix: DB-level deduplication keyed on message_id. Early in your workflow, before any AI call:

sql
INSERT INTO message_log (message_id, chat_id, received_at)
VALUES ($1, $2, NOW())
ON CONFLICT (message_id) DO NOTHING
```

Check `rowsAffected`. If `0` → this `message_id` was already processed → exit immediately. If `1` → new message → continue. The `ON CONFLICT DO NOTHING` is atomic, so even if two webhook retries arrive within milliseconds of each other, only one proceeds.

**2. Hallucinations — system prompt guardrails that actually work**

The most reliable pattern is a hard forbidden-phrases list at the top of your system prompt, above everything else:

```
ABSOLUTE GUARDRAIL — NEVER say these unless you are transferring to a human:
- "I'll check and get back to you"
- "Our team will contact you"
- "I'll look into that"
- Any phrase that implies future action you won't take

If you cannot answer right now with available information, say so directly and ask a clarifying question. Do NOT promise follow-up.
```

The key is making the guardrail structural: tie it to a handoff tag. Tell the AI: "if you cannot help, output `{handoff}` in your reply." Then your workflow strips the tag, notifies the human, and the AI never lies to the customer about what it can do.

Context bleed is the other hallucination trigger. If sessionId isn't scoped tightly per user (e.g. `{{ $json.from }}_v1`), Simple Memory mixes up customer contexts and the AI confidently answers with the wrong customer's data. Bumping the version suffix (`_v2`, `_v3`) whenever you change the prompt is also a clean way to wipe stale context for everyone.

**3. Latency — buffer rapid-fire messages, hit AI once**

With 6 tool sub-workflows, the real killer isn't the tools themselves — it's the AI being triggered 3× for "ok / sounds good / what time?" sent as three separate messages. Buffer pattern:

- INSERT each message into a `msg_buffer` table (`chat_id`, `content`, `inserted_at`)
- Wait node: 8–10 seconds
- SELECT all buffered messages for that `chat_id` → concatenate → DELETE buffer → send one combined message to the AI

One AI call + one tool fan-out beats three parallel AI calls every time. The perceived latency actually drops because the user gets one coherent response instead of three partial ones firing out of order.

Topic		Replies	Views
Whatsapp webhook, recieving consecutive messages Questions webhook , whatsapp	5	1891	November 15, 2024
WhatsApp Debounce Flow: Combine Multiple Rapid Messages into One AI Response Using Redis ( n8n) Tips & Tricks whatsapp	0	400	November 23, 2025
AI Agent on WhatsApp Questions workflow-building	5	106	March 11, 2026
Show and Tell: ZapPro — WhatsApp AI Agent Template for n8n (triage, handoff, follow-up) Built with n8n whatsapp	3	106	April 16, 2026
I need a help in in workflow regarding whatspp Questions workflow-building	4	71	March 15, 2026

Help: Optimizing a WhatsApp Restaurant Agent (Gemini Flash) – Solving Hallucinations, Latency, and Double-Texting

Related topics