I’m processing invoices that come in via PDF attachments on email. The purpose of the flow is to extract the invoice info and create supplier invoices in an upstream system via an API.
At the core is an extraction node that pulls the relevant info from the invoice into a JSON object. The invoice has has both invoice number and PO number and I need both. The extraction mixes these up ever so often. What adds complexity is different invoices may use different terms for invoice number and PO no.
Any ideas on how I can strengthen this data extraction?