Chat-based Turn Detection For Less Interrupts from AI Agent Chat

Struggling with your AI agent interrupting users during 2-way conversations? This is because your agent doesn’t differentiate partial messages or pauses and can’t determine when users have finished their “turn”. Turn detection is a technique to solve this and comes in many flavours and in this article, I share an experiment with detecting “end of utterance” in n8n.

Hey there :wave: I’m Jim and if you’ve enjoyed this article then please consider giving me a “like” and follow on Linkedin or X/Twitter . For more similar topics, check out my other AI posts in the forum.


“Predicting End of Utterance” is a technique to reduce Interruption from AI voice agents and a topic I picked up whilst listening in on Tom Shapland’s (@ Livekit) talk at AI Engineer World Fair 2025 which took place on the 5th June.

The technique boils down to analysing recent message history and predicting when a user “finishes” their message which doesn’t always mean when they pause. Deceptively simple concept but was quite difficult to implement!

This prediction task typically uses traditional ML to solve, I wanted to see how well an LLM would fair. Of course, I built this in n8n as I have a few chat agents which could benefit from this and though it was a struggle, I did manage to implement a working example to test the technique. My initial concerns was that this would be too slow but it honestly wasn’t that bad and in my opinion, a totally feasible alternative to buffering user messages via timeout.

Good to know
“Turn detection” typically assumes we’re talking about voice but this article’s use-case is instant text messaging scenarios such as through Telegram and WhatsApp. If you’re interested in turn detection for audio, check out these resources:


How it works

example: interview example: food order

1. Chat Messaging with Async Responses (Telegram, Whatsapp)

Due to the need to buffer messages, we need total control over when the AI agent should respond. Unfortunately, this rules out using the builtin n8n chat trigger as I don’t think there is a way to skip replying to user messages. This isn’t much of a dealbreaker if you’re using Telegram or Whatsapp however as these work very well with this technique.

2. Redis fo Session management

To store the buffered message and state of the conversation, I chose to go with Redis as my data store of choice. In my experience, Redis is just really fast in hold bits of ephemeral data and getting it back out again. Additionally, Redis’ ability to automatically expire records makes for easier data management.

3. The Prediction Loop using n8n’s Wait functionality

The “prediction” step is where I use an LLM to analyse buffered user messages to determine if the agent should send a reply. The analysis uses a text classifier with 2 outputs - true if end of turn is detected or false if there’s likely more to follow.

The challenge with this kind of workflow is that you don’t necessarily want to run inference on every execution. The risk being that with parallel running predictions, you could get duplicate replies sent and turns state going out of sync.

My solution to this problem was to implement a single loop which only starts running when a new start of user message is received and when the prediction is false, waits for more buffered messages before looping back around and trying the prediction again. To implement the “waiting” for new messages, I used n8n’s wait node.

The trick is very much in the “waiting”. Rather than being time-based, the wait node is set to be resumed by webhook whereby a HTTP request must be made to the execution’s resumeURL to continue. When we receive new messages and we detect a prediction loop is already active, instead of starting a new prediction, the message is buffered and triggers the resumeURL. This triggers the “waiting” execution to continue which resumes the prediction.


The Template

I’m pleased to share my turn detection template with Telegram on my creator page.
Check it out and let me know if you get around to testing it - would love your feedback to improve it! Of course, feel free to swap out Telegram or Redis for your preferred tools.


Conclusion

  1. Turn detection definitely feels more sophisticated as a solution. It’s definitely noticeably less interruptions and as a user, I also don’t feel the rush to reply and when I’m finished, the agent replies with little delay which feels like a more “natural” and “realtime” conversation.
  1. Using LLMs for turn detection can actually work! I don’t think I’ve used the most optimal prompt or the fastest model to do the prediction task so there’s plenty of room for improvement.

Do you know of other turn detection methods or approaches? Please recommend them in the comments below and I’ll check them out and compare. Thanks! <End Turn>.

if you’ve enjoyed this article then please consider giving me a “like” and follow on Linkedin or X/Twitter. For more similar topics, check out my other AI posts in the forum.

Still not signed up to n8n cloud? Support me by using my n8n affiliate link.

Need more AI templates? Check out my Creator Hub for more free n8n x AI templates - you can import these directly into your instance!