Optimizing Content Automation: Best Way to Extract and Process Transcriptions in N8N?

Hello, everyone!

I’m looking for a way to set up a workflow in N8N where I can upload an MP4 file, extract its transcription, and then use that transcription to generate new editorial content streams with the help of automated agents.

Does anyone have any suggestions on how to structure this first part of the workflow?

Thanks in advance!

1 Like

Hey @Giulia_Patricio welcome to the community!

How do you imagine triggering your workflow?
You could do

  • Upload via n8n form
  • Upload via POST Request
  • Upload from third party app like drive or telegram

Next, you can use OpenAI Whisper (search for “Transcribe Recording”) to get your file transcribed. Once you have the transcript, you can process it with LLM calls / Agents. Did I understand it right that by “first part of the workflow” you mean the transcription?

If you want another option I have also used groq hosted whisper but for that you would have to use a custom http node

1 Like

Hello, @jksr

Actually, when I refer to the first part of the workflow, I mean how I can automatically send or receive video files.

I imagine one option would be to upload the file within a form trigger, which would then initiate the workflow. Do you think this approach would work?

After that, can I directly connect this workflow to an OpenAI node and link it to the other agents to complete the process?

Thanks in advance!

Ahh sorry I misread the file type! So I think understanding videos can be done by both gtp4o and gemini 2.0. But its not an easy implementation.

It reads like you have to break down the videos into frames

It does work with inline data for gemini (video must be smaller than 20MB)
See as follows:

So I think uploading the file can easily be done with a form trigger but if its a big video you will need an implementation that matches what they suggest in the post.

1 Like

Thank you very much for your help! Do you have any recommendations on how to set up the best approach for AI assistants to work on this content?

Whats your goal? I would go at least in two steps. Raw extraction from the source file and summarizing, classifying etc.

To create new editorial streams (SEO-optimized blog posts, Instagram posts, and video scripts) based on the transcription of the uploaded videos.

Thinking about simplifying the workflow to run an initial POC, do you think there would be a way for me to simply input an already transcribed material and, from that text, have AI-connected agents generate new content streams for me?

Sure, I mean working with text is the easiest. Checkout some of the example workflows as well to see how others have done content generation before Discover 1516 Automation Workflows from the n8n's Community or Discover 1516 Automation Workflows from the n8n's Community