Making an AI Powered Voice Agent Using Twilio and N8N

Mookie_Lian · August 13, 2025, 12:43pm

Hello everyone,

I need help.

I’m making a voice agent using n8n. I know that n8n doesn’t work well with Websockets and it works best with REST but I still had to give it a chance. The voice agent that is in my mind is simply:

I make an outbound call from n8n using Twilio API
My phone number is configured with a TwiML App that sends request to my self-hosted websocket
My websocket sends three request types to my n8n; Call Starts, Call Ends and User Transcript
Every type is a webhook trigger in my n8n instance and has it’s own flow
User Transcript flow is AI powered. In which the AI responds to transcript as a prompt
Another node to makes request to Google Cloud TTS
Last node makes request to respond to the user in the same call*

This sounds real time but it’s not precisely so. Since my websocket sends requests to my n8n instance like a machine gun. The workflow triggers a thousand times before the AI has a chance to talk. I made a ‘latest message‘ logic to prevent the workflow from exhaustion. I haven’t tested it yet since my Twilio free trial has ended. That’s why I am asking you for help. I need to know if any of you had done this before and refused to use Elevenlabs, Vapi.ai or any other overpriced project.

I’m using Tmpfiles.org to create a public URL for the audio file created by TTS service. The websocket I’m using is in Github: GitHub - MookieLian/twilio-n8n-media-stream-master