Help: Optimizing a WhatsApp Restaurant Agent (Gemini Flash) – Solving Hallucinations, Latency, and Double-Texting

MAIN WORKFLOW_SEVERIN PLUS.json (71.9 KB)
My restaurant bot (Severin Plus) is experiencing high latency and “double-text” errors. The current architecture is a linear synchronous flow:

  1. WhatsApp Trigger → Duplicate Filter → HTTP Store Status Check → HTTP Typing Indicator → AI Agent (Gemini 1.5 Flash + 6 Tool sub-workflows) → Send Message.

Main Issues:

  • Webhook Retry Bug: Users often have to send a message twice. I believe the linear flow (multiple HTTP requests and tool calls) exceeds Meta’s 5-second webhook timeout, causing it to retry the message because a 200 OK wasn’t sent fast enough.
  • High Latency: Every sub-workflow tool adds execution overhead. The sequential nature makes the bot too slow for live service.
  • Concurrency/Hallucinations: The AI gets confused with simultaneous users. I suspect Simple Memory (5-message window) is failing to handle concurrent sessions reliably.

Request for Guidance: I want to move to a production-ready Worker/Queue pattern. Specifically, I need advice on:

  • Decoupling Response: How to immediately respond with a 200 OK and handle the AI logic asynchronously in the background.
  • Parallelization: How to run the status check and typing indicator simultaneously.
  • Persistent Memory: Best practices for moving from Simple Memory to Postgres/Redis for high concurrency.

Note: My workflow JSON was too large to paste; I have attached the file to this post. Any help would be appreciated!

1 Like

Hi @Thoth_AI Welcome!

You are right you, just make sure to not exceed Meta’s limits:

I have reviewed your workflow and please do not consider using MULTIPLE sub workflows connected to a single AI agent node, this would really help you decrease the overall latency of your workflow.

Again this links to how you leverage AI into your workflow, as i can see currently it is using AI but at really large scale and calling a lot of different flows which might not cause hallucination in initial runs but will cause it in production when it will be used extensively, and please consider using a proper database like supabase that is really a scalable solution.

How do i ensure so many subworkflows are not connected to a single ai agent.

what strategy can i use to get the best efficiency

@Thoth_AI Consider dividing your sub workflow across multiple AI agents so that each agent gets different type of sub workflows to deal with to insure proper diversion of tasks across AI agents, if possible reducing some sub workflows is also a good take, Also you can use this:

So that if you have some tasks where you just need another AI just use sub AI agent node. This approach would ensure proper diversion of tasks and also it will reduce overhead on one single AI agent which will reduce hallucination risks in production.