I’m creating an AI Agent on WhatsApp and some questions have been raised regarding Performance and Infrastructure.
When passing the JSON code through several AIs, they all suggested breaking the flow into at least three parts: 1. Receiving messages; 2. Processing using an AI agent and AI agent tools; 3. Responding via the WhatsApp API;
Is this breakdown a good practice in n8n development?
Does it really improve performance on the n8n cloud?
Another issue is that I’m using Redis to buffer truncated messages: “Hello”, “How are you”, “How’s it going”, “I want…”, and for this I put a 10s wait in the middle of the comparison, and the AIs that analyzed the code said it’s a major performance offender, but they didn’t propose adequate solutions. How do I do this?
Another thing is that the Agent nodes and other things always need a model and a memory. I had set up a connection for each, and the AIs suggested that I should connect everything to a single model and memory node. Is that correct?
Splitting workflows - Yes, splitting into receive - process - respond is good practice. Keeps each workflow focused and easier to reuse or tune. Better to have one root workflow and child reusable workflows where possible.
Performance - Yes, on n8n cloud modular workflows scale better and you can optimize concurrency/timeouts per workflow. On self-hosted it also helps avoid memory issues.
Message buffering - The 10s Wait is a bottleneck. Better to use Redis with a per-user buffer + lock (first message processes, next ones update the buffer). Use “move to start” for critical sections.
Shared model/memory - Correct, use a single memory node with a shared session key (user ID) so all agents use the same context. Same for model configs - centralize and reuse. But again always think about individual case, maybe you don’t want mix memory or include scope to sub agents which is also fine.
The following explains my “Cancel & re-run with bigger scope” approach, but it’s still fairly complex. I’d avoid this unless it’s truly necessary:
It uses Ainoflow instead of Redis, but the idea is the same - just JSON storage. When a subworkflow is called, it adds a message to the queue and tries to get a lock. Once the lock is acquired, it reads all messages from the queue and sends them to the agent. After processing, it checks whether the queue has new messages:
if yes - it reverts the last response, releases the lock, and lets another workflow continue;
if no - it responds to the client, clears the queue, and unlocks processing.
The test input simulates a user sending two messages, then a third after the response. The first two are aggregated into one reply, and the third is handled as an individual response.
I also personally prefer a simpler approach sometimes – sending a “processing” indicator (emoji or short message) so the user sees the response is coming, then removing it after processing. Both methods work well depending on the use case.