good thinking on the boundary-vs-semantic split, most people skip that. the one thing id be curious about is how badly it fails on edge cases like purchase orders that look like invoices. if you test with real docs, let me know what the accuracy patterns are.
Thanks — that’s exactly the failure surface I’m trying to get closer to next.
Right now I’ve only shown narrow boundary-oriented behavior, not document-classification accuracy yet.
So for the next case, I’m specifically looking for:
an invoice-like document workflow
a small explicit label set
at least one example where the expected label is already known
ideally one ambiguous case such as a purchase order that could be mistaken for an invoice
What I want to observe is not just:
did it stay inside the allowed class set?
but also:
did it match the expected label for this case?
what downstream action would break if it was wrong?
how should ambiguous cases be handled: hold, review, or fail_safe?
If you have even a simplified anonymized example, that would already help a lot.
Happy to continue by DM if easier.