On cloud version, you have to use an external service that you can convert the file.
On self-hosted ffmpeg does the job.
Explanation:
Both iOS and Android use the AAC (Advanced Audio Coding) codec inside their .m4a files, but they often use different profiles and bitrates:
iOS Voice Memo (.m4a): Typically uses AAC-LC (Low Complexity) with standard settings that are highly compatible and widely supported. Whisper supports this.
Android Voice Recorder (.m4a): Can use various profiles like AAC-HE (High Efficiency) or AAC-HEv2 at very low bitrates, or a slightly non-standard implementation of AAC.
So, I tried with an android 7 and the file works, tried with android 11 I get your error…
So, the compatibility between Google (aka android) and Openai(that likes more the IOS)…
iOS Voice Memo AAC-LC (Low Complexity) Highly compatible, standard, simple profile. OpenAI’s parser reads the moov box, sees a perfect, common definition for AAC-LC, and successfully decodes the stream.
Android Recorder AAC-HE / AAC-Main (High Efficiency / Main Profile) Optimized for lower bitrates, often with complex tricks like SBR (Spectral Band Replication) (Source 2.1). OpenAI’s parser reads the moov box, sees a definition for HE or Main AAC that is slightly non-standard or unfamiliar to its strict internal audio library.