AI Product Video Ad Generator (Text-to-Speech + Video with Wavespeed & ElevenLabs) 

Hey there
I’m Marc and if you’ve enjoyed this article then please consider giving me a “like”
. Over the next few days, I’ll be sharing the workflows I’ve been working on over the past days.
Workflow Description
This workflow automatically generates personalized product video ads by combining aritificial intelligence, video generation, text-to-speech and audiovisual automation.
PART 1 – Product Video Generation: ![]()
The workflow starts when the user sends a request, for example, “Create a 5-second ad for my new energy drink.” A language model (llama3-70B-versatile) identifies relevant marketing trends and generates a visual prompt for the ad. This prompt is sent to the Wavespeed API, which creates a 5-6 second video. The system periodically checks the video’s generation status until it’s finished, then downloads and stores the video locally.
PART 2 – Audio Generation: ![]()
The product name and description (e.g., Thunderbolt XR) are set. This context is used by the same language model to generate a promotional dialogue or script about the product. The text is cleaned (removing unwanted parts and normalizing spaces) and sent to ElevenLabs’ API to convert it into realistic speech with a predefined voice. The resulting audio is saved as an .mp3 file.
PART 3 – Combining Video and Audio:
+ ![]()
Finally, the video from Part 1 and the audio from Part 2 are merged using FFmpeg. This synchronizes the animated visuals with the voiceover, producing a fully personalized, engaging video ad ready for social media or marketing campaigns.
Tools Used
- LLama3-70B-versatile (via groq)
- ElevenLabs API (text-to-speech)
- Wavespeed AI (video generation)
- FFmpeg (for audio/video merging)
AI Product Video Ad Generator Workflow 
Created by Marc Serrano
Outputs 
Create a 5-second ad for my artisanal chocolate brand. ![]()
Brand Name: Noir Cacao
Short Description
:
Noir Cacao is a premium artisanal chocolate brand that celebrates the purity of single-origin cacao. Crafted in small batches with natural ingredients and no additives, each bar offers a rich, intense, and authentic sensory experience. From bean to wrapper, Noir Cacao honors the tradition of fine chocolate with a modern and elegant approach.

User Prompt: Create a 5-second ad for my luxury watch brand ![]()
Product Name: Aurum Timepieces
Short Description
:
Aurum Timepieces embodies the excellence of Swiss watchmaking, combining timeless design with mechanical precision. Created for those who value detail and distinction, each watch is a masterpiece forged in gold and elegance.

Tips 
-
The part of making API requests to ElevenLabs for text-to-speech can be simplified because n8n already includes a dedicated node for ElevenLabs.
-
The best video generation model I have used in terms of quality is minimax/hailuo-02/t2v-standard. It delivers high-resolution, clear, and visually impressive videos, making it ideal for professional and detailed video creation.
-
The minimax/hailuo-02/i2v-pro model generates animated videos from a reference image combined with a user’s text prompt. It creates high-quality, 1080p videos that animate the image based on the described scene or style, allowing users to produce customized videos easily from static images. Example from this image:
To generate this video with my workflow, only change the model that generates the video with Wavespeed.

- The best free voices to use in ElevenLabs are: YOq2y2Up4RgXP2HyXjE5 (Gaming – Unreal Tonemanagement 2003), CeNX9CMwmxDxUF5Q2Inm (Johnny Dynamite - 80s Radio DJ), and k8cFOyAg7B9qwBlDDNTC (Miguel - Una voz natural ideal para comerciales).
Perfect for
- Automated marketing teams
- Content creators showcasing products
- E-commerce projects or personalized demos
Information on your n8n setup
- n8n version: 1.101.1
- Running n8n via (Docker, npm, n8n cloud, desktop app): npm
- Operating system: Windows
