I would say that AI image generation is growing better everyday, so i think you can leave the image generation entirely if you really use the best models and prompt them extremely well.
Use this, i use this for almost all the media generation.:
And
That is another approach you can take where you actually set a template in canva and use the native node to add an image or maybe text, but that would not get you as much control but still that would make it very sure that your images will be according to what you want them to be, so if you are scaling this consider this approach, else Gemini can create alot of good media.
So for image content if i follow the scaled approach, then first get yourself a good background image generated related to the context ofc, then pass it down to canva node, and there address the image first and then add the text , and then export it, if you are using this kinda setup it is very less prone to any prompt related errors until unless the image you have generated does not have enough context for quality.
YES! Please use Gemini Models for all the media tasks, use GPT-4o or greater models for TOOL calling and use Claude for writings like text headings and all that.
If we are actually talking about the system which posts 10 posts a day without compromising quality, i would really never consider the autonomous logic, instead i will always keep a human in the loop and so that the workflow can start again with human remarks if there is something slightly off, and for now focus on the content quality and how you can articulate that to canva, and once that is done then you can move forward to supabase as your database, and for all the media generation context it is a problem that it is a bit hard to get a linear output every time so that is the problem that you can tackle with very huge prompts. Else it is your call on how you shape this!
Hi @lucmvandervliet-svg
I’ve seen that the best approach is focusing on architecture using a single source of truth—an LLM-generated ‘Campaign JSON’ with the headline, caption, CTA, visual theme, and branding rules. Then, in n8n, you split it into three stages: generating the image (text-free), rendering the overlay via HTML/Canvas or an API like Cloudinary/Canva, and finally publishing. Since baked-in text from image models is still inconsistent, this workflow ensures perfect alignment and versioning.
The ‘single source of truth’ approach (JSON campaign doc) that @tamy.santos mentioned is solid. One thing I’d add: if you’re scaling to 10+ posts per day, consider a pre-generation step that validates alignment before posting. I’ve seen workflows fail silently when Canva API rate-limits hit or when the HTML-to-image render is slightly off. So: (1) Generate campaign JSON, (2) Render preview (HTML/Canvas), (3) Human review step (or auto-flag if alignment confidence <90%), (4) Publish. The review step kills automation speed but prevents brand mismatches at scale.