Looking for 1 invoice-classification workflow with an explicit expected label

I’ve already tested 3 materially different workflow lines in a narrow Phase 1 setup.

What I have so far:

  • voucher validation
  • invoice extraction / strict schema validation
  • support email classification / routing

That helped me observe boundary behavior, but the next thing I want is one case that gets closer to semantic correctness.

So I’m looking specifically for 1 invoice-document classification workflow where the expected label is already known upfront.

A strong fit would be something like:

  • Invoice
  • PurchaseOrder
  • CreditMemo

What I want to test next is the separation between:

  • boundary safety = did the output stay within the allowed class set?
  • semantic correctness = did it match the expected label for this specific case?
  • downstream risk = what breaks if the label is wrong?

A short outline is enough first.
I do not need a full production payload immediately.

What helps:

  • rough payload shape
  • target schema
  • label set
  • one simple example with an explicit expected label
  • optionally one mixed / ambiguous example
  • a short note on the business rule for ambiguous handling
  • polling or webhook preference

What I can return:

  • whether the run ended in succeeded or failed_safe
  • a short reason if relevant
  • a receipt reference

This is still narrow Phase 1 work, not broad onboarding.

Public kit:

If you have a case like this, reply here or DM me.

good thinking on the boundary-vs-semantic split, most people skip that. the one thing id be curious about is how badly it fails on edge cases like purchase orders that look like invoices. if you test with real docs, let me know what the accuracy patterns are.

Thanks — that’s exactly the failure surface I’m trying to get closer to next.

Right now I’ve only shown narrow boundary-oriented behavior, not document-classification accuracy yet.
So for the next case, I’m specifically looking for:

  • an invoice-like document workflow
  • a small explicit label set
  • at least one example where the expected label is already known
  • ideally one ambiguous case such as a purchase order that could be mistaken for an invoice

What I want to observe is not just:

  • did it stay inside the allowed class set?

but also:

  • did it match the expected label for this case?
  • what downstream action would break if it was wrong?
  • how should ambiguous cases be handled: hold, review, or fail_safe?

If you have even a simplified anonymized example, that would already help a lot.
Happy to continue by DM if easier.

Quick update:

I’m now specifically looking for a 4th Phase 1 case that is closer to semantic correctness, not just boundary behavior.

What would help most:

  • an invoice-document classification workflow
  • a small explicit label set
  • at least one example with a known expected label
  • ideally one ambiguous case (for example, something invoice-like that could be misclassified)

What I need:

  • rough payload/document shape
  • target schema
  • label set
  • one example with an explicit expected label
  • optionally one ambiguous example
  • a short note on what breaks downstream if the label is wrong
  • webhook or polling preference

What I return:

  • whether the run ended in succeeded or failed_safe
  • a short reason if relevant
  • a receipt reference

This is still narrow Phase 1 work.
I’m not looking for broad onboarding or generic AI demos.

If you have a case like this, reply here or DM me.
If full payload sharing is hard, even a simplified anonymized outline is enough first.