Looking for 1 invoice-classification workflow with an explicit expected label

I’ve already tested 3 materially different workflow lines in a narrow Phase 1 setup.

What I have so far:

  • voucher validation
  • invoice extraction / strict schema validation
  • support email classification / routing

That helped me observe boundary behavior, but the next thing I want is one case that gets closer to semantic correctness.

So I’m looking specifically for 1 invoice-document classification workflow where the expected label is already known upfront.

A strong fit would be something like:

  • Invoice
  • PurchaseOrder
  • CreditMemo

What I want to test next is the separation between:

  • boundary safety = did the output stay within the allowed class set?
  • semantic correctness = did it match the expected label for this specific case?
  • downstream risk = what breaks if the label is wrong?

A short outline is enough first.
I do not need a full production payload immediately.

What helps:

  • rough payload shape
  • target schema
  • label set
  • one simple example with an explicit expected label
  • optionally one mixed / ambiguous example
  • a short note on the business rule for ambiguous handling
  • polling or webhook preference

What I can return:

  • whether the run ended in succeeded or failed_safe
  • a short reason if relevant
  • a receipt reference

This is still narrow Phase 1 work, not broad onboarding.

Public kit:

If you have a case like this, reply here or DM me.

Thanks — that’s exactly the failure surface I’m trying to get closer to next.

Right now I’ve only shown narrow boundary-oriented behavior, not document-classification accuracy yet.
So for the next case, I’m specifically looking for:

  • an invoice-like document workflow
  • a small explicit label set
  • at least one example where the expected label is already known
  • ideally one ambiguous case such as a purchase order that could be mistaken for an invoice

What I want to observe is not just:

  • did it stay inside the allowed class set?

but also:

  • did it match the expected label for this case?
  • what downstream action would break if it was wrong?
  • how should ambiguous cases be handled: hold, review, or fail_safe?

If you have even a simplified anonymized example, that would already help a lot.
Happy to continue by DM if easier.

Quick update:

I’m now specifically looking for a 4th Phase 1 case that is closer to semantic correctness, not just boundary behavior.

What would help most:

  • an invoice-document classification workflow
  • a small explicit label set
  • at least one example with a known expected label
  • ideally one ambiguous case (for example, something invoice-like that could be misclassified)

What I need:

  • rough payload/document shape
  • target schema
  • label set
  • one example with an explicit expected label
  • optionally one ambiguous example
  • a short note on what breaks downstream if the label is wrong
  • webhook or polling preference

What I return:

  • whether the run ended in succeeded or failed_safe
  • a short reason if relevant
  • a receipt reference

This is still narrow Phase 1 work.
I’m not looking for broad onboarding or generic AI demos.

If you have a case like this, reply here or DM me.
If full payload sharing is hard, even a simplified anonymized outline is enough first.