AI Product Video Ad Generator (Text-to-Speech + Video with Wavespeed & ElevenLabs)

AI Product Video Ad Generator (Text-to-Speech + Video with Wavespeed & ElevenLabs) :loudspeaker:

Swapping manual work for AI magic

Hey there :wave: I’m Marc and if you’ve enjoyed this article then please consider giving me a “like” :pink_heart: . Over the next few days, I’ll be sharing the workflows I’ve been working on over the past days.

Workflow Description

This workflow automatically generates personalized product video ads by combining aritificial intelligence, video generation, text-to-speech and audiovisual automation.

PART 1 – Product Video Generation: :film_projector:
The workflow starts when the user sends a request, for example, “Create a 5-second ad for my new energy drink.” A language model (llama3-70B-versatile) identifies relevant marketing trends and generates a visual prompt for the ad. This prompt is sent to the Wavespeed API, which creates a 5-6 second video. The system periodically checks the video’s generation status until it’s finished, then downloads and stores the video locally.


PART 2 – Audio Generation: :speaker_high_volume:
The product name and description (e.g., Thunderbolt XR) are set. This context is used by the same language model to generate a promotional dialogue or script about the product. The text is cleaned (removing unwanted parts and normalizing spaces) and sent to ElevenLabs’ API to convert it into realistic speech with a predefined voice. The resulting audio is saved as an .mp3 file.


PART 3 – Combining Video and Audio: :film_projector: + :speaker_high_volume:
Finally, the video from Part 1 and the audio from Part 2 are merged using FFmpeg. This synchronizes the animated visuals with the voiceover, producing a fully personalized, engaging video ad ready for social media or marketing campaigns.

:hammer_and_wrench: Tools Used

  • LLama3-70B-versatile (via groq)
  • ElevenLabs API (text-to-speech)
  • Wavespeed AI (video generation)
  • FFmpeg (for audio/video merging)

AI Product Video Ad Generator Workflow :gear:

:person_juggling: Created by Marc Serrano

Outputs :fire:

Create a 5-second ad for my artisanal chocolate brand. :chocolate_bar:

Brand Name: Noir Cacao
Short Description :page_facing_up::
Noir Cacao is a premium artisanal chocolate brand that celebrates the purity of single-origin cacao. Crafted in small batches with natural ingredients and no additives, each bar offers a rich, intense, and authentic sensory experience. From bean to wrapper, Noir Cacao honors the tradition of fine chocolate with a modern and elegant approach.

Noir Cacao Ad video

User Prompt: Create a 5-second ad for my luxury watch brand :watch:

Product Name: Aurum Timepieces
Short Description :page_facing_up: :
Aurum Timepieces embodies the excellence of Swiss watchmaking, combining timeless design with mechanical precision. Created for those who value detail and distinction, each watch is a masterpiece forged in gold and elegance.

Aurum Timepieces Ad video

Tips :star:

  • The part of making API requests to ElevenLabs for text-to-speech can be simplified because n8n already includes a dedicated node for ElevenLabs.

  • The best video generation model I have used in terms of quality is minimax/hailuo-02/t2v-standard. It delivers high-resolution, clear, and visually impressive videos, making it ideal for professional and detailed video creation.

  • The minimax/hailuo-02/i2v-pro model generates animated videos from a reference image combined with a user’s text prompt. It creates high-quality, 1080p videos that animate the image based on the described scene or style, allowing users to produce customized videos easily from static images. Example from this image:

To generate this video with my workflow, only change the model that generates the video with Wavespeed.
Video generated based on the image of the car

  • The best free voices to use in ElevenLabs are: YOq2y2Up4RgXP2HyXjE5 (Gaming – Unreal Tonemanagement 2003), CeNX9CMwmxDxUF5Q2Inm (Johnny Dynamite - 80s Radio DJ), and k8cFOyAg7B9qwBlDDNTC (Miguel - Una voz natural ideal para comerciales).

:light_bulb: Perfect for

  • Automated marketing teams
  • Content creators showcasing products
  • E-commerce projects or personalized demos

Information on your n8n setup

  • n8n version: 1.101.1
  • Running n8n via (Docker, npm, n8n cloud, desktop app): npm
  • Operating system: Windows
5 Likes

:folded_hands: Thank you for checking out this workflow.
If you found it interesting or useful, I’d really appreciate a like :heart:.
Also, your comments and suggestions :speech_balloon: are valuable to help me improve and keep sharing new automations with the community. :rocket:

Hey Marc, this sounds like an awesome workflow! I love how you’re combining multiple AI tools to create such an efficient and automated process for generating product video ads. The integration of text-to-speech and video generation is such a powerful way to reduce manual work while delivering personalized content quickly. I can imagine how useful this could be for marketers who need to produce engaging video ads on the fly. It’s also great that you’re using cutting-edge technology like Wavespeed and ElevenLabs to bring it all together. Looking forward to seeing more of your workflows! If you’re interested in learning more about different types of video production, check out this blog post: Types of Video Production.