I built a support inbox router for a friend – turns out classification alone wasn't enough. Here's what I added

:waving_hand: Hey n8n Community,

Quick recap for those new to the Mike saga: Mike is a friend of mine who runs a small company. Over the last few months, I’ve helped him and his finance colleague Sarah automate a bunch of their invoice headaches – duplicate detection, Slack-based approvals, and most recently a smart mailroom that classifies every incoming email attachment and routes it to the right pipeline.

After we got the billing@ inbox under control, Mike came back with another one. “Can we do the same for support@?” he asked. “Right now everything lands in one big shared inbox – billing questions, bug reports, sales inquiries, the lot. Sarah ends up triaging it manually every morning and it takes forever.”

I told him it should be straightforward. Famous last words.

:inbox_tray: The starting point: just classify and route

The first version was supposed to be a no-brainer. Gmail trigger → easybits Extractor classifies the email into billing, technical, sales, or other → Switch node routes to the right Slack channel. Five-ish nodes, done.

It worked. But the moment we tested it on a few real emails from his actual inbox, the limitations showed up:

  • Some emails are genuinely ambiguous – short, vague, or mixing two topics. The classifier picked one, but you couldn’t tell from the result whether it was a confident call or a guess.
  • An angry “WHERE IS MY REFUND” got the same Slack treatment as a polite “hey, quick question about annual billing.” Same channel, same notification. Sarah was still scanning the channel manually to figure out what to handle first.
  • Half of Mike’s customers write in German. The first prompt was English-only and was clearly weighting English urgency words more heavily than German ones — formal-sounding German emails were getting marked as low-priority even when the sender was clearly fed up.

So the workflow grew. In a good way, I think.

:bullseye: Adding confidence scoring

The Extractor now returns a confidence level (high / mid / low) alongside the category. Anything classified as low-confidence gets automatically rerouted to the “Other” channel with a breadcrumb showing what the model’s best guess was. That way Sarah can spot patterns over time – if low-confidence emails keep getting tagged as “billing,” that’s a signal the billing prompt needs more examples.

The trick I found here: telling the model that confident “other” should still score high confidence. Otherwise you get a flood of low-confidence “other” classifications that aren’t actually uncertain – they’re just genuinely uncategorizable, and that’s correct behavior, not a failure.

:vertical_traffic_light: Adding sentiment-based priority scoring

Then we layered in a priority field – urgent / normal / low – based on sentiment, urgency words, and business impact signals. Each Slack message now gets a priority emoji (:police_car_light: / :green_circle: / :white_circle:) and label, so Sarah can scan a busy channel and immediately see what needs attention now vs. what can wait.

:globe_showing_europe_africa: Making it bilingual

The biggest unlock was making the classification AND priority scoring work equally well in English and German. Same Extractor call, same workflow – just bilingual prompts that explicitly list urgency markers, frustration words, and politeness conventions for both languages.

A few things that surprised me while writing the prompts:

  • German business correspondence is often more formal than English even when the sender is upset, so the model needs to be told not to read formality as low urgency.
  • Frustration markers like “Frechheit” or “Unverschämtheit” don’t translate one-to-one and would have been completely missed without explicit listing.
  • “Dringend” (urgent) is used more liberally in German than “urgent” is in English, so it can’t be a single-word trigger.

For the summary field, I had it write the summary in the same language as the email – German emails get a German summary, English emails get an English. Keeps it readable for whoever is handling it without needing translation.

One easybits call returns all four fields (category, summary, confidence, priority) in a single pass. No separate sentiment analysis service, no second model call, no extra latency.

:wrench: The workflow

Here’s the full workflow – sticky notes inside walk through every step, and it’s already sanitized so you can import it directly:

:package: Field prompts

The four Extractor field prompts (category, summary, confidence, priority) are in the repository as separate markdown files – copy them straight into your easybits pipeline field descriptions:

:backhand_index_pointing_right: n8n-workflows/easybits-support-inbox-router at 269d3137aa4018543720f2ae88d0b312deda5356 · felix-sattler-easybits/n8n-workflows · GitHub

:toolbox: Setup essentials

You need the easybits Extractor community node:

  • n8n Cloud: already verified and available – just search for ‘easybits Extractor’ in the node panel
  • Self-hosted: Settings → Community Nodes → Install '@easybits/n8n-nodes-extractor'

Then connect Gmail, Slack, and easybits credentials, set up your four Slack channels, and you’re good. The sticky notes inside the workflow walk through each step.

For those of you running shared inboxes – what’s your current triage setup? Are you handling priority/urgency detection separately from category classification, or rolling them into one model call like I did here? Curious if anyone has a sharper approach for the bilingual part especially – German + English is a real pain point for European n8n users and I’d love to compare notes.

Best,
Felix

1 Like

I ran into this same thing on a church project here in Ghana.

I’d roll category, urgency, confidence, and short summary into one model call unless the inbox volume is so high that accuracy becomes a problem. The confidence score is the key part for me, because some messages should not be auto-routed with fake certainty.

For bilingual handling, I’d add real examples from past emails in both languages instead of only listing keywords. Tone carries differently across cultures, so examples usually teach the model better than rules alone.

Also, I’d keep a simple review log of low-confidence messages. That gives you the data to keep improving the prompt instead of guessing.

1 Like

Hey @Exnav29 , thanks a lot for your feedback – those are really valid points.

I actually used real-life emails to refine the sentiment analysis prompt. Since I implemented this for Mike last week, we agreed to review it together with Sarah in about two weeks to see how accurate the urgency scoring is. She’ll flag any incorrect cases directly in Slack (e.g., with a thumbs down), so I can iterate on the setup if needed. That said, when we first connected it to their Gmail account, it processed 38 existing emails and assigned the urgency correctly, which gives me a lot of confidence in the current setup.

The idea of a separate review log is a good one, but in this case Slack effectively acts as the log. Each message includes the confidence score, so after a couple of weeks I should have a good sense of how well the scoring holds up.

1 Like