Hello, I’m trying to build an AI image analysis tool that writes an estimate, or at least identifies the damaged parts, of a car from a POST request with the VIN and an image of the damage. I vibecoded a site lol that sends JSON of the VIN and binary of the image via POST and then edits the image to put the VIN in the top left corner, and I got big company LLMs such as claude and chatGPT to read the vin in the top left then identify the damage and respond with a JSON format of the damage and the car type my site can read and then show the response in a nice format to the user. But now I want to train my own model - I tried huggingface but I can only get that to work with a chat response for some reason and when I try it with a POST request it says something about conversational only. I dont want my site to be like a chatbot. And all the other tools such as AIcado and unsloth do not work with n8n. does anyone know how to do this? Thanks
@riversong87871 I don’t think training a model is necessary. Have you considered using Qwen VL models? They’re amazing at identifying objects and can even provide coordinates. Gemini 2.5 Pro also has impressive vision capabilities
I haven’t tried Qwen yet. Can you help me with setting up Hugging Face to work with a POST request? I’ll also take a look at Gemini. Currently, I’m using Opus 4.1 and have experimented with GPT 5
I’ve used qwen via openrouter and had it examine images. There’s some free qwen versions on there but they’re slower so I use the paid ones. I use python for the requests.
Just tested the gemini 2.5 pro on openrouter in my automation and so far it is really good.
Thanks for the attention
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.