Hi everyone,
I’m trying to build an automated transcription workflow in n8n for large MP3 and MP4 files. I came across ElevenLabs API, but it’s a bit expensive for my use case.
I’m looking for low-cost or free alternatives, ideally something like Whisper AI. I noticed that Hugging Face hosts Whisper models, and I tried to use them within n8n but couldn’t get it to work properly — either due to authentication issues or input size limits.
Does anyone here have experience with:
- Running Whisper AI (via Hugging Face or another service) inside n8n?
- Efficiently transcribing large files (over 100MB)?
- Any tips for hosting Whisper locally or using an API that integrates well with n8n?
My ideal setup would:
- Keep costs minimal
- Handle large file sizes
- Work smoothly with n8n’s workflow logic
Any working examples, custom node tips, or general guidance would be greatly appreciated!
Thanks in advance 
1 Like
Hey! 
Had this exact problem — large video file that just wouldn’t compress enough for Whisper.
The real issue is Whisper’s 25MB limit forces you into this messy chain of converting, compressing, splitting, and stitching — and it still breaks on bigger files.
What fixed it for me was using a transcription API that takes the video URL directly. No downloading, no compression, no splitting at all. It processes the full video server-side and returns a clean transcript with speaker labels and timestamps.
Docs here: https://wayin.ai/api-docs/video-transcription/
Happy to share the simple n8n workflow JSON if it helps!
4 Likes
trying to split a 100mb+ audio file into 25mb chunks inside n8n is a one-way ticket to out-of-memory crashes. the guy above has the right idea about using URLs, but you don’t need an expensive custom api for it.
the standard backend pattern for this is to stop passing massive binary payloads through n8n completely. just have n8n upload the raw file to a cheap bucket (like cloudflare r2 or aws s3), generate a temporary pre-signed url, and pass that url to a serverless gpu provider like replicate or fal.ai.
they host the exact same open-source whisper-large-v3 models, they accept direct urls to bypass standard file size limits, and you only pay for the literal seconds of gpu compute time (usually pennies per hour of audio). n8n just waits for the webhook back when the transcription is done.