Hello everyone,
I need help.
I’m making a voice agent using n8n. I know that n8n doesn’t work well with Websockets and it works best with REST but I still had to give it a chance. The voice agent that is in my mind is simply:
-
I make an outbound call from n8n using Twilio API
-
My phone number is configured with a TwiML App that sends request to my self-hosted websocket
-
My websocket sends three request types to my n8n; Call Starts, Call Ends and User Transcript
-
Every type is a webhook trigger in my n8n instance and has it’s own flow
-
User Transcript flow is AI powered. In which the AI responds to transcript as a prompt
-
Another node to makes request to Google Cloud TTS
-
Last node makes request to respond to the user in the same call*
This sounds real time but it’s not precisely so. Since my websocket sends requests to my n8n instance like a machine gun. The workflow triggers a thousand times before the AI has a chance to talk. I made a ‘latest message‘ logic to prevent the workflow from exhaustion. I haven’t tested it yet since my Twilio free trial has ended. That’s why I am asking you for help. I need to know if any of you had done this before and refused to use Elevenlabs, Vapi.ai or any other overpriced project.
I’m using Tmpfiles.org to create a public URL for the audio file created by TTS service. The websocket I’m using is in Github: GitHub - MookieLian/twilio-n8n-media-stream-master