Hi everyone,
I’m stuck on a workflow and would appreciate your help.
My goal is to allow users to transcribe audio files (m4a, wav, mp3) via a form (using the Form Node). The transcription should be handled by Gemini 2.5 Pro.
The challenge is that I need to send a dynamic text prompt along with the audio file (e.g., “Differentiate between speakers” or “Summarize the key points”). This prompt is also captured in the form. The standard transcription nodes do not seem to support sending both a file and a custom prompt.
I’ve tried to solve this using an AI Agent, but I can’t figure out how to correctly pass both the text prompt and the binary audio data to the API.
What would be your approach to solve this elegantly?
Thanks in advance!