Gemini, Google Cloud and PDF

SKI · March 22, 2025, 8:20am

In my workflow, I want to send PDF files that are already in Google Drive DIRECTLY to Gemini via an API to extract information. This works perfectly on the Gemini website, as Gemini accesses other tools here. I’m currently using base64 formatting via the API. Since the documents (PDF) are structured differently, this unfortunately doesn’t work very well. Is there an option identical to the one on the website, but via API, so I can integrate this into my n8n workflow? Thank you very much

Miquel_Colomer · March 22, 2025, 9:16am

Hi @SKI,

What you’re running into makes sense — Gemini on the website can access tools like Google Drive directly, but the API version doesn’t have that same tool integration yet. So when you’re sending a PDF via API (e.g., base64-encoded), you’re not getting the same quality of parsing or context Gemini has when it reads the file natively through Drive.

Right now, the public Gemini API doesn’t support the same “tool calling” features available in the UI — including direct file access from Google Drive. So even though it works great on the site, you can’t replicate that behavior 1:1 via API just yet.

What you can do for now is try one of these options:

– If you can download the file in your n8n workflow, use an external PDF-to-text node or tool (like PDF.co or pdftotext) to pre-process the file before sending the content to Gemini.

– Use the Gemini API’s file upload support (if you’re on a version that supports it), which allows you to pass documents more naturally than raw base64. This is still limited but might give better results than sending base64 directly.

If you’re looking for structured extraction from mixed-format PDFs, the best short-term workaround is to pre-process the file content outside Gemini and only use Gemini for higher-level reasoning on the parsed text.

Let me know how your workflow is built and I can suggest a couple of solid n8n node options for PDF handling before hitting the API.

I hope this helps.

SKI · March 22, 2025, 9:45am

Thank you for your response.
I use Google Workspace, so I have access to all Google Cloud and Gemini Advanced products.
If possible, I would like to stick to these tools that are available.
As you correctly stated, the Gemini web interface uses various tools to analyze a document (for example, an incoming invoice) and delivers nearly 100% correct answers.
I have not been able to achieve this level of quality with the APIs so far, hence my request.
I will try to describe the process simply and attach a screenshot of my current process, which I planned in n8n. It should be noted that the loop is currently a problem. Due to the low quality of the extraction, I am trying to perform a so-called double-check of the extraction and only process further in case of a match. I currently have no solution for the loop, as it continues even if a match occurs in between.
Desired process:

Trigger a Google Drive folder for new PDF files (documents)
Process the PDF file (extraction)
Query fixed information from the document
- Document date
- Supplier
- Document recipient
- Document number
- Total amount
- Document due date with discount
- Document due date in general (payment term)
Transfer the data to a Google Sheet
Rename and move the PDF to another folder
Could you please help me?

10001790961890×451 111 KB

Miquel_Colomer · March 22, 2025, 10:11am

Could you share the workflow in json format?

This would be useful to analyze it better.

SKI · March 22, 2025, 10:33am

When I download the workflow, does it contain only IDs or the complete access data as well?

Miquel_Colomer · March 22, 2025, 10:47am

Structure/execution order of your workflow, integrations used.

But no credentials are provided unless you hard-coded them in http requests.

Workflows are useful to understand what are your trying and detect possible problems.

SKI · March 29, 2025, 1:19pm

Hi @Miquel_Colomer !

Can i sent you this directly or should i put it here?

SKI · April 25, 2025, 8:43am

Hello!

Has no one found a solution?

I’ve now tried uploading the PDF via the Google File API and also subsequently obtained the File ID from the File API in a workflow. How to then set up the AI request with a prompt based on this is still unknown to me.

Please help, thank you!