Android .m4a audio files fail with OpenAI Transcription node

Describe the problem/error/question

Hi everyone,

I’m using n8n Cloud and I’ve been testing the OpenAI Transcription (Whisper) node with .m4a audio files.

Here’s what I’ve found:

  • iOS Voice Memo .m4a files transcribe successfully using the OpenAI node :white_check_mark:

  • Android Voice Recorder .m4a files always fail with this error :cross_mark:

  • Android Voice Recorder .m4a files transcribe successfully using the gemini :white_check_mark:

how can i transcribe m4a files or m4a to mp3 without ffmpeg on n8n cloud

What is the error message (if any)?

Invalid file format. Supported formats: [‘flac’, ‘m4a’, ‘mp3’, ‘mp4’, ‘mpeg’, ‘mpga’, ‘oga’, ‘ogg’, ‘wav’, ‘webm’]

1 Like

same error.

with file all fine, i`d easy open it in adobe audition.

whats problem???

not solve yet

@n8n

If your question has not been solved, please let us know in a new reply on your original topic and we’ll get back on it.

Did you manage to solve it? How did you solve it?

On cloud version, you have to use an external service that you can convert the file.

On self-hosted ffmpeg does the job.

Explanation:

Both iOS and Android use the AAC (Advanced Audio Coding) codec inside their .m4a files, but they often use different profiles and bitrates:

  • iOS Voice Memo (.m4a): Typically uses AAC-LC (Low Complexity) with standard settings that are highly compatible and widely supported. Whisper supports this.

  • Android Voice Recorder (.m4a): Can use various profiles like AAC-HE (High Efficiency) or AAC-HEv2 at very low bitrates, or a slightly non-standard implementation of AAC.

So, I tried with an android 7 and the file works, tried with android 11 I get your error…

So, the compatibility between Google (aka android) and Openai(that likes more the IOS)…

iOS Voice Memo AAC-LC (Low Complexity) Highly compatible, standard, simple profile. OpenAI’s parser reads the moov box, sees a perfect, common definition for AAC-LC, and successfully decodes the stream.

Android Recorder AAC-HE / AAC-Main (High Efficiency / Main Profile) Optimized for lower bitrates, often with complex tricks like SBR (Spectral Band Replication) (Source 2.1). OpenAI’s parser reads the moov box, sees a definition for HE or Main AAC that is slightly non-standard or unfamiliar to its strict internal audio library.

i coudlnt yet, Looking at the answer below, it seems impossible without third party library