Voice Chatbots with a Voice LLM

The popularity of n8n is on the rise, primarily because it offers a convenient platform for building text-based AI chatbots, such as those integrated with Telegram. The next frontier is the development of voice-based AI chatbots. Relying on outdated methods—like converting speech to text, processing it through a large language model (LLM), synthesizing the response back into speech, and then delivering it—seems inefficient. Modern neural network models are now capable of processing voice inputs directly, eliminating the need for intermediate transcription and synthesis steps. Moreover, integrating voice capabilities is feasible through technologies like WebRTC.

Is there hope for the near future to have an AI agent that can accept voice input directly, without the necessity of converting it to text first? Implementing such a system, especially when combined with dynamic tools, would represent a significant advancement in AI automation.

4 Likes

Sure is exciting times, btw if you didn’t use yet,

Are very good, I see some people connecting mcp server or webhooks to ai agents, for near realtime replies, quite remarkable.

Best regards,

Samuel

Vapi is an outdated technology that uses the transcription-LLM-speech synthesis pipeline. ElevenLabs does not want to work with Russia. They also offer a service that is an alternative to n8n. I would really like to create scripts with voice LLMs specifically in n8n!