I got tired of manually copying invoice data every week, so I built this workflow to handle it automatically. Works with any invoice format - no templates needed

What it does

Watches a Google Drive folder for invoices, extracts all the data, validates the numbers, and logs everything to a Google Sheet. Takes about 6 seconds per invoice.

Here’s what gets extracted:

  • Invoice number, dates, vendor, customer info

  • Line items with descriptions, quantities, and prices

  • Subtotal, tax, and total

  • Works with multi-page invoices

  • Handles 15+ languages

How it works

Upload invoice → AI extracts data → validates calculations → saves to Sheets → sends Slack notification

The validation part is helpful - it checks if the line items actually add up to the subtotal and flags any issues. Saves you from data entry errors.

[Screenshot of workflow]

What I’ve tested it with

Been using this for a few months now with:

  • Utility bills (electric, water, internet)

  • Vendor invoices from Amazon, Office Depot, and a bunch of random suppliers

  • Service invoices from freelancers

  • Scanned PDFs (the OCR works pretty well)

  • Even photos of invoices taken on my phone

Accuracy is around 98% on digital PDFs, drops to about 92% on scanned stuff. Not perfect but way better than typing everything manually.

Setup

Takes about 15 minutes to set up if you follow the guide below. You’ll need:

  • Google Drive and Sheets (free)

  • n8n instance

  • PDF Vector account (free tier gives you 100 credits/month, enough for ~33 invoices)

  • Slack is optional - you can skip those nodes if you don’t use it

Download

Workflow file attached:

document-extraction-done.json (8.2 KB)


Setup Guide

Getting Started

Step 1: Get your API key

Go to https://www.pdfvector.com and sign up. Free plan works fine for testing. Navigate to the API Keys section and generate a new key. You’ll need this in Step 5.

Step 2: Prepare your Google Drive

Create a folder called “Invoices” in your Google Drive. This is where you’ll drop invoice files.

Step 3: Set up your Google Sheet

Create a new sheet and add these headers in the first row:

Invoice Number | Invoice Date | Due Date | Vendor | Customer | Subtotal | Tax | Total | Currency | Items | Status | Warnings | File Name | Processed Date

Copy the Sheet ID from the URL - it’s the long string between /d/ and /edit.

Step 4: Import the workflow

In n8n, click Import from File and select the JSON file. The workflow will show up in your canvas.

Step 5: Configure the nodes

Google Drive Trigger:

  • Connect your Google account

  • Select your “Invoices” folder

Download Invoice:

  • Use the same Google credential

PDF Vector - Extract Invoice:

  • Add new credential (Bearer Token)

  • Paste your API key from Step 1

Log to Invoice Database:

  • Connect your Google account

  • Paste your Sheet ID

  • Make sure the sheet name matches (usually “Sheet1”)

Slack nodes (optional):

  • Connect your workspace and pick a channel

  • Or just delete these if you don’t want notifications

Step 6: Test it

Activate the workflow (toggle in top-right), then upload a test invoice to your Google Drive folder. Wait about a minute and check your Google Sheet. Should see the extracted data there.

What gets extracted

The workflow pulls out:

Header info: Invoice number, dates, currency

Vendor details: Company name, address, tax ID

Customer details: Name and billing address

Line items: Full breakdown of each item with description, quantity, unit price, and total

Totals: Subtotal, tax amount, and final total

The validation logic

This is the part that actually saves time on corrections later:

Line item check: Makes sure all the line items add up to the subtotal. If there’s a difference bigger than a penny, it logs a warning.

Total calculation: Verifies that subtotal + tax = total amount. Also logs a warning if something’s off.

Required fields: Checks that you have an invoice number, vendor name, and total amount. If any of these are missing, marks the invoice as “Invalid” so you know to look at it manually.

The warnings are usually just rounding differences - most invoices have these and they’re not actually errors. But it’s good to know about them.

Cost

Free tier is 100 credits per month. Each invoice uses about 3 credits, so you can do ~33 invoices for free.

If you need more, the basic plan is $25/month for 3,000 credits (around 1,000 invoices). Still way cheaper than paying someone to do data entry.

Customizing it

Add more fields:

Open the “PDF Vector - Extract Invoice” node and edit the schema. You can add things like purchase order numbers, payment terms, or shipping addresses. Just add them to the properties section:

"purchaseOrderNumber": { "type": "string" },
"paymentTerms": { "type": "string" }

Change where it saves:

Don’t have to use Google Sheets. Replace that node with Airtable, Notion, or even a direct QuickBooks integration if you want.

Process emails instead:

Swap the Google Drive Trigger with a Gmail Trigger to handle invoice attachments directly from email. Everything else stays the same.

Multi-currency:

Add a Code node after validation to convert amounts to your base currency using an exchange rate API.

Troubleshooting

Workflow doesn’t trigger: Check that the Active toggle is on and the folder path is correct. Look at the execution logs if you’re not sure what’s happening.

No data extracted: Make sure your API key is right and you haven’t run out of credits. Also check that the PDF actually has invoice data in it.

Getting validation warnings: This is normal. Most invoices have small rounding differences between line items and subtotals. The data still gets saved, just flagged so you know about it.

Slack not working: Verify your credentials are connected and you have permission to post in the channel. You can test the Slack node manually by clicking “Execute Node”.

Google Sheets errors: Double-check your Sheet ID is correct and the sheet name matches. Also make sure the column headers are exactly right - they’re case-sensitive.

Performance notes

I’ve run about 50 different invoice formats through this. Here’s what I’ve found:

  • Digital PDFs: ~98% accurate, processes in 3-5 seconds

  • Scanned documents: ~92% accurate, takes 8-12 seconds

  • Phone photos: ~87% accurate, usually 8-15 seconds

  • Multi-page invoices: Add about 2 seconds per extra page

Works with English, Spanish, French, German, and a bunch of other languages. Haven’t found one it completely fails on yet, though handwritten stuff can be hit or miss.

Why this approach

I tried a few things before landing on this:

Template-based systems need you to train a model for each vendor. Pain to maintain and breaks when they change their invoice format.

Basic OCR just gives you text without any structure. You still have to parse it yourself.

ChatGPT/Claude are inconsistent with field extraction and sometimes make up numbers. Also expensive if you’re doing volume.

This uses a schema-based approach - you define the fields once and it works for any vendor. No training, no templates, consistent output.

Limitations

Not perfect. Here’s what doesn’t work well:

  • Extremely messy handwritten invoices

  • Invoices with weird custom fields (it’ll still try but might miss them)

  • Very low quality scans where text is barely readable

For these cases, the workflow will either flag them as invalid or log warnings so you can check manually.

Links

Questions?

Drop a comment if you run into issues or want to customize something. I’ll help if I can.