Looking for recommendations on using a Vision LLM model, has anyone had any experience of reading renovation/building plans with detailed consistent results. Thanks in advance.
Hey @EVconnect welcome!
Vision LLMs (or VLMs) are one of my favourite topics.
- Read up on “vision RAG” where the techniques used are kinda universal for all VLM workflows.
- VLMs have progressed enough that it’s feasible to consider self-hosting (QwenVL, PaddleVL). Requires a bit of technical knowhow but main upside being significant cost savings compared to cloud.
- I’d recommend starting with the Google Gemini family of vision models. At time of writing, definitely the best in the biz in terms of building confidence needed for demos, proof of concepts for evaluation.
- If you absolutely need to guarantee accuracy, VLM workflows can benefit from n8n’s human-in-the-loop functionality to get human approval before continuing.
Check out these n8n examples here: Discover 7164 Automation Workflows from the n8n's Community
Thanks, your feedback is really helpful. Will dive in !