Help Finalizing a Bigfoot Vlog Style Video Workflow in n8n (No VEO3, Looking for Working D-ID or Alternative Stack)

Hey n8n community :waving_hand:

I’m working on a fun and semi-viral automation inspired by those “Bigfoot Vlog” AI videos that are blowing up right now on TikTok and Instagram. The idea is to go from a script → voiceover → talking video all using n8n (ideally no Google Veo3 or paid studio tools like Runway).

:white_check_mark: What I’ve built so far:

  • Generated script using OpenAI (n8n HTTP + OpenAI node)
  • Turned it into audio using ElevenLabs (custom POST node with working voice ID + stream)
  • Sent the audio via Telegram using binary workaround

:brick: What I need help with:

I’m trying to generate a talking video using an image (Bigfoot photo) + the ElevenLabs audio, like the style of D-ID or HeyGen.
But the D-ID node has limitations and doesn’t support binary audio in n8n Cloud well.

Has anyone successfully automated:

  • Image + custom audio → talking video (via webhook, D-ID, or alternative)
  • Or found a workaround that doesn’t involve VEO3 or RunwayML?

Happy to share the full working parts of the workflow if it helps troubleshoot the last piece. Just trying to get the final lip-sync video output before post-editing in CapCut or Canva.

Would love any help, insights, or similar examples. :folded_hands:

Thanks in advance!