Hello!
I’m trying to build out a workflow with an AI Agent that can assist a sales team in generating quotes and retrieving product information.
There’s a chat trigger being used where the user can upload a file. I’ve got my workflow working with .CSV and .PDF that contains text.
I’ve hit a bit of a roadblock on making this work when:
- A PDF actually contains images inside and not text.
- An XLSX file is uploaded with multiple sheets (I could only extract the first sheet).
I’ve started to explore possibly adding in some OCR functionality but I also think that in the PDF example mentioned above I’d also need to handle file conversions to an image file before even passing it to an OCR service.
Ideally I’d like to build an agent that can handle:
Here is my current workflow
I’m using gpt-40-mini as my agent model as well if that’s relevant.
text-embedding-3-small for any embedding.
Any advice would be most appreciated!
Thanks,
Dan
Hey, there are some models with visions capability out there that are specialized in PDF understanding, the first I can think of is the newest from Mistral (Mistral OCR | Mistral AI), which seems to be really good and might be what you’re looking for (and you wouldn’t need to worry about images in PDF as the model can take entire PDFs)
For the XLSX file with multiple sheets, you can specify the name of the sheet to get, default will be the first one. One problem tho is that you can’t get the names of the sheets, so either you can try with default “Sheet 2”, “Sheet 3” etc… Or, if you are self-hosted, you can install an xlsx package to get the sheets names in a code node and loop in your workflow for each sheet.
A last solution I could see would be to download your xlsx on onedrive and retrieve the sheets via the excel365 node “Get sheets”
Let me know if this helps you ! 
1 Like
Thanks for the advice!
I’m using the cloud version of n8n which I am starting to realise does come with some limits over the self-hosted 
Would Mistral as an AI Agent just process a file from the chat trigger without having to convert/vectorise the contents first?
According to the documentation, I don’t think that you could putting it directly as an AI Agent, but you could create a workflow to extract the content via API and passing it as a tool to your main Agent.
There is a template that uses Mistral OCR to extract data from pdf (Parse and Extract Data from Documents/Images with Mistral OCR | n8n workflow template), it would be a good starting point I think !
1 Like
That’s great, thank you for the help.