I want to create a workflow that allows me to modify a video by adding text and my voice associated with the text. I managed to add the text to the image (template), and I successfully converted the text to voice, but I can’t add the voice to the image
- Generate Voice from Text
-
Use ElevenLabs or Google TTS API (HTTP Request node).
-
Save audio as
voiceover.mp3
(Write Binary File -
- Add Text to Image**
-
Use Python (Pillow/OpenCV) in a Function node or Cloudinary API (HTTP Request).
-
Save as
image_with_text.png
.
- Combine Image + Audio into Video
Option A: FFmpeg (Local)**
- Run in Execute Command node:
ffmpeg -loop 1 -i "image_with_text.png" -i "voiceover.mp3" -c:v libx264 -tune stillimage -c:a copy -shortest "output.mp4"
Option B: Shotstack API (Cloud)**
- Send image/audio URLs via HTTP Request node to:
POST https://api.shotstack.io/v1/render {"timeline": {"tracks": [{"clips": [ {"asset": {"type": "image", "src": "YOUR_IMAGE_URL"}, "length": 10}, {"asset": {"type": "audio", "src": "YOUR_AUDIO_URL"}, "length": 10} ]}]}}
(Optional) Upload Video**
- Use Google Drive or AWS S3 node to save/auto-publish.
1 Like