Google Gemini Node fails to transcribe Instagram audio (video/mp4), but works for WhatsApp (audio/ogg)

Hello everyone,

I’m running into an issue with the Google Gemini node (Transcribe a recording) when trying to transcribe audio messages from Instagram, and I’m hoping to get some advice from the community.

My Workflow Setup

My workflow is triggered by the WhatsApp Cloud webhook and handles messages from both WhatsApp and Instagram. For audio messages, the flow is:

  1. Webhook receives the message with a URL to the audio file.
  2. An HTTP Request node downloads the file from the Meta URL (lookaside.fbsbx.com/... ). This step works correctly, and I get the binary data.
  3. The binary data is passed to a Google Gemini node with the “Audio” resource selected to transcribe it.

The Problem

The process works perfectly for WhatsApp messages.

  • The HTTP Request node downloads a file with Mime Type: audio/ogg .
  • The Gemini node receives this .ogg file and transcribes it without any issues.

However, the process fails for Instagram audio messages.

  • The HTTP Request node successfully downloads the file, but I’ve noticed the file format is different:
    • File Name: audioclip-....mp4
    • Mime Type: video/mp4 (even though it’s just an audio message)
  • When this .mp4 binary data is passed to the Gemini node, the node fails with the following error:
{
  "error": {
    "code": 500,
    "message": "Failed to convert server response to JSON",
    "status": "INTERNAL"
  }
}

My Hypothesis

My guess is that the Gemini node’s audio transcription endpoint cannot process a video/mp4 container, even if it only contains an audio track. It expects a pure audio format like the audio/ogg it receives from WhatsApp. The error message seems generic, but the root cause appears to be the file format incompatibility.

My Question for the Community

  1. Has anyone else encountered this issue with Instagram audio messages?
  2. Is there a recommended best practice for handling Instagram audio transcription in n8n?

Information on your n8n setup

  • n8n version: 1.104.0
  • Database (default: SQLite): Postgres
  • n8n EXECUTIONS_PROCESS setting (default: own, main): default
  • Running n8n via (Docker, npm, n8n cloud, desktop app): self-hosted
  • Operating system: Ubuntu 24.10 VPS

Hi @sewan , as far as I’m concerned, I’ve got a Request failed with status code 400 for all the Binary Files !

But it perfectly works with URLs

Hola @sewan. Tengo el mismo problema con audios de instagram utilizando el nodo de OpenAI para transcribir. Por favor déjame saber si encuentras la solución

————————

Hi @sewan. I have the same problem with Instagram audios using OpenAI node to transcribe. Please let me know if you find a solution

Use a custom script or external service (like ffmpeg) to convert the .mp4 file to a supported audio format (e.g., .ogg or .mp3).

Some APIs (and possibly Gemini) use the file extension and MIME type to determine how to process the file. If you pass a video/mp4 file, the API may expect video content, not just audio, and fail if it doesn’t find what it expects.

No pude resolver de esa forma pero lo resolví utilizando un http request a groq para transcribir el audio. El problema es que OpenRouter parece no funcionar para transcribir texto (o al menos no encontré la forma de hacerlo correr)

——————

I couldn’t solve it that way, but I resolved it by using an HTTP request to Groq to transcribe the audio. The problem is that OpenRouter doesn’t seem to work for transcribing text (or at least I couldn’t find a way to make it work)

@sewan maybe this will work for you

1 Like