I’m working on building an n8n workflow that begins with a form trigger node where users can upload either an audio or video file. The goal is to:
Detect the type of uploaded file (audio or video).
If it’s a video, convert it to audio.
Extract the audio duration and metadata.
Send the audio file to OpenAI’s Whisper API (or an equivalent speech-to-text service) to generate a transcription.
Can you guide me through designing this complete workflow, including tools or services for conversion, metadata extraction, and the Whisper API request integration?