How can I transcribe large MP3/MP4 files in n8n

Hi everyone,

I’m trying to build an automated transcription workflow in n8n for large MP3 and MP4 files. I came across ElevenLabs API, but it’s a bit expensive for my use case.

I’m looking for low-cost or free alternatives, ideally something like Whisper AI. I noticed that Hugging Face hosts Whisper models, and I tried to use them within n8n but couldn’t get it to work properly — either due to authentication issues or input size limits.

Does anyone here have experience with:

  1. Running Whisper AI (via Hugging Face or another service) inside n8n?
  2. Efficiently transcribing large files (over 100MB)?
  3. Any tips for hosting Whisper locally or using an API that integrates well with n8n?

My ideal setup would:

  • Keep costs minimal
  • Handle large file sizes
  • Work smoothly with n8n’s workflow logic

Any working examples, custom node tips, or general guidance would be greatly appreciated!

Thanks in advance :pray: