Invalid PDF structure

Invalid PDF structure
Hello community, I have a workflow that is working well. It gets Invoices from Gmail to Drive and Google Sheets. When I receive a ZIP file, it decompresses, puts it in a specific folder, and sends data to a spreadsheet, the same with PDFs.
I have the invoices from a specific vendor. The invoices come with a PDF compressed, but the workflow in this case is not working. It puts the file in the drive as a document (Google Docs) instead of a PDF decompressed like the other vendors, with strange characters. And when the workflow needs to send the data to the spreadsheets, it is not doing it.

That output is XML.

Are you confident that what you downloaded should be a pdf?

Can you download and decompress manually on your computer and see what the file comes out to be?

This could be a mistake as well if it was a link you downloaded. AWS commonly formats their data definitions for files in XML format similar to what you show.

Yes, I am 100% confident. When I open the email without the workflow, this is how it looks:

The PDF opens normally when accessed manually, and the content is readable.d the content is readable.

I solved the issue! I made some changes to the Structured Output Parser, and now it is working.

2 Likes

Well it looks like the unzipped file contains 2 files. The pdf, and an xml definition file.

I assume n8n is choosing the xml file for some reason instead of the pdf when decompressing.

edit: what was the change? I may be missing but don’t see a change. Curious what fixed this.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.