I’m trying to build a workflow in n8n and I would really appreciate your guidance on the following use case.
I receive emails in Gmail that contain attachments. These attachments can be in various formats, such as:
PDF
JPEG / PNG (images)
XLSX
DOCX
and similar file types
My goal is:
To detect incoming emails using the Gmail node.
To access and read the content of the attached file.
To extract meaningful information from that file (text, tables, or data, depending on the format).
To send this extracted content to an AI Agent (e.g., OpenAI, or another LLM).
To let the AI process the content and return a structured or natural-language response.
Ideally, I want a flexible solution that can handle different file types and route them through the appropriate processing logic (OCR for images, text extraction for PDFs, parsing for Excel, etc.).
My questions:
What is the best way to access and process attachments from the Gmail node?
Are there existing nodes or community workflows for parsing PDFs, images (OCR), XLSX, and DOCX files?
How would you recommend structuring this pipeline before sending the content to an AI node?
Any examples, best practices, or references would be extremely helpful.
For your workflow, you’ll want to start with the Gmail node to fetch the emails and attachments. Then, use the “Move Binary Data” node to access the attachments.
Next, you can use the “IF” node to create different paths for handling each file type (PDF, image, XLSX, DOCX). For PDFs, try the “PDF Extract Text” node. For images, you can use the “Image OCR” node. For XLSX and DOCX, you might need to use the “Code” node with a suitable library to parse the files.
Finally, send the extracted text to an AI node like OpenAI to process the content.
What is the best way to access and process attachments from the Gmail node?
The Gmail trigger has an option to “Download attachments”, this would also give you filetype infomation.
2. Are there existing nodes or community workflows for parsing PDFs, images (OCR), XLSX, and DOCX files?
In my opinion Mistral AI’s OCR tool is the best out there right now.
There is an available node Called Mistal AI: Extract text or ideally you would use their API in a HTTP Request node to do your ocr transformation.
With this i think you could achieve the OCR process quite easy with only one switch/if statement (image or document)
3. How would you recommend structuring this pipeline before sending the content to an AI node?
I would need to more about the document size and structure to recommend, parsing to the AI agent.
The documents I receive are metal inquiry requests that I want the AI agent to process. They usually contain between 5 and 20 line items, so the data volume is not very large.
These requests can come in different formats, for example:
As images (photos or screenshots)
As Excel files
As Word documents
Sometimes as PDFs
Typical examples of the content look like this:
Pipe 60.3 x 3.6 – 10 pcs
UPN 200 – 12 pcs
Square tube 100 x 100 x 3 – 600 m
and so on.
My main goal is for the AI agent to:
Understand each line item
Extract structured data (type, dimensions, quantity, unit)
Normalize the information
And then pass it to the next step of my workflow (pricing, validation, etc.)
So the documents are relatively short, but the formatting is often inconsistent, which is why I rely on the AI for interpretation rather than strict parsing.