I’m looking for a way to set up a workflow in N8N where I can upload an MP4 file, extract its transcription, and then use that transcription to generate new editorial content streams with the help of automated agents.
Does anyone have any suggestions on how to structure this first part of the workflow?
How do you imagine triggering your workflow?
You could do
Upload via n8n form
Upload via POST Request
Upload from third party app like drive or telegram
Next, you can use OpenAI Whisper (search for “Transcribe Recording”) to get your file transcribed. Once you have the transcript, you can process it with LLM calls / Agents. Did I understand it right that by “first part of the workflow” you mean the transcription?
If you want another option I have also used groq hosted whisper but for that you would have to use a custom http node
Ahh sorry I misread the file type! So I think understanding videos can be done by both gtp4o and gemini 2.0. But its not an easy implementation.
It reads like you have to break down the videos into frames
It does work with inline data for gemini (video must be smaller than 20MB)
See as follows:
So I think uploading the file can easily be done with a form trigger but if its a big video you will need an implementation that matches what they suggest in the post.
Thinking about simplifying the workflow to run an initial POC, do you think there would be a way for me to simply input an already transcribed material and, from that text, have AI-connected agents generate new content streams for me?