Have a look at the example provided in this post and let me know if this helps. You can use the Mistral Vision APIs to extract data from documents using a prompt. You should be able to instruct it on how you want the data to be returned.
If you are able to provide an example document and the json structure you expect, I can try and build an example workflow.