I want to create a workflow that allows me to modify a video by adding text and my voice associated with the text

I want to create a workflow that allows me to modify a video by adding text and my voice associated with the text. I managed to add the text to the image (template), and I successfully converted the text to voice, but I can’t add the voice to the image

  1. Generate Voice from Text
  • Use ElevenLabs or Google TTS API (HTTP Request node).

  • Save audio as voiceover.mp3 (Write Binary File

    1. Add Text to Image**
  • Use Python (Pillow/OpenCV) in a Function node or Cloudinary API (HTTP Request).

  • Save as image_with_text.png.

  1. Combine Image + Audio into Video

Option A: FFmpeg (Local)**

  • Run in Execute Command node:
    ffmpeg -loop 1 -i "image_with_text.png" -i "voiceover.mp3" -c:v libx264 -tune stillimage -c:a copy -shortest "output.mp4"
    

Option B: Shotstack API (Cloud)**

  • Send image/audio URLs via HTTP Request node to:
    POST https://api.shotstack.io/v1/render
    {"timeline": {"tracks": [{"clips": [
      {"asset": {"type": "image", "src": "YOUR_IMAGE_URL"}, "length": 10},
      {"asset": {"type": "audio", "src": "YOUR_AUDIO_URL"}, "length": 10}
    ]}]}}
    

(Optional) Upload Video**

  • Use Google Drive or AWS S3 node to save/auto-publish.
1 Like