Hello, I’m building a workflow that creates several binary audio files that I’d like to combine into a single file. The generation of the files is working. I can even aggregate them into a single item, thinking maybe it could help a downstream node to merge them. But I haven’t found a way to combine them to give a single audio file as result. Does anyone know how to do this? Is there a ‘magic audio file manipulation’ node somewhere? Or maybe it can be done with some clever JS in a code node?
I’m on n8n 2.10.4, on the cloud
There’s no built-in audio merge node, but two approaches that actually work:
Option 1: Execute Command node with ffmpeg (cleanest)
Save each audio file to disk first, then use Execute Command to run ffmpeg:
ffmpeg -i /tmp/audio1.mp3 -i /tmp/audio2.mp3 -filter_complex "[0:a][1:a]concat=n=2:v=0:a=1[out]" -map "[out]" /tmp/combined.mp3
You’d use a Code node to write the binary data to temp files, Execute Command to merge, then read the output back as binary. A bit verbose but very reliable.
Option 2: Code node with base64 concatenation (for WAV/PCM only)
If your files are raw PCM WAV with identical sample rates and channels, you can strip the headers from all but the first file and concatenate the raw audio data in a Code node. For WAV, the audio data starts at byte 44. Something like:
const buffers = items.map(item => Buffer.from(item.binary.data.data, 'base64'));
// First buffer keeps its header, rest strip the 44-byte WAV header
const combined = Buffer.concat([buffers[0], ...buffers.slice(1).map(b => b.slice(44))]);
// Update the data size in the header...
This gets fiddly with the header size fields so I’d only use it for simple cases.
Option 3: External API
If you’re generating audio via ElevenLabs or similar, many have server-side concatenation endpoints which saves you the ffmpeg complexity entirely.
What format are your audio files and how are you generating them? That’ll narrow down the best approach.
Hi Derek, and thank you for your reply with alternative approaches. In my workflow, the audio files are generated by an https request node that receives several chunks of text formatted in JSON and outputs several mp3 files, as separate items. I’m currently on n8n Cloud, so I’d have to go with option 2 - code node. How would the concatenation change for mp3 files?