solid approach! idempotency is one of those things most people skip until they’ve sent duplicate emails or charged someone twice lol. we do something similar with redis as the dedup store — are you using a database or in-memory for the event key lookup?
@neshkito makes sense — check-before-mutation is the only safe order when the side effect can’t be undone. redis fits well here because the atomic SET NX gives you the check and the lock in one operation, so you don’t have a race window even across concurrent executions. sharing dedup state across multiple workflow instances is another benefit of keeping it external — scaling horizontally doesn’t break the idempotency layer.
nice, community node is a much cleaner install than wiring up HTTP + IF yourself. still on redis here (SET NX + TTL), thats been working well for the use cases we have. does it expose the event key so you can plug in your own dedup logic, or is the key format fixed?
The key isn’t fixed - run_id is a free-form field you can set to
anything ({{ $execution.id }}, your own webhook event ID, etc.).
AARI uses it to group related actions in the audit trail, not for
dedup itself.
If you need dedup at the gate level (block duplicate calls with
the same key), that’s idempotency_key - currently not exposed in
the node UI, but it’s in the underlying API. Worth adding as an
optional field?
What’s your dedup use case - webhook retries or something else?
mostly webhook retries from providers with at-least-once delivery — stripe, shopify etc. we use the event id from the payload as the key since thats stable across retries. exposing idempotency_key as optional would be useful, expression binding makes it flexible enough to handle different key formats per integration.
Yep, that’s the ideal setup - stable event ID → idempotency key.
That’s exactly where duplicate execution issues show up (at-least-once delivery + retries).
Exposing idempotency_key as an optional, expression-based field in the node UI makes sense - keeps it flexible across providers without adding complexity for simpler flows.
nice, will test it once its out. expression binding makes it drop-in across different providers without having to rewire the dedup key per integration.
Yeah exactly. Event_id as key is the cleanest way to handle retries.
Redis SET NX + TTL works well there, especially when everything is coming from the same source.
Where it got messy for us was when retries came from multiple layers (provider + client + queue) and the same logical action ended up with slightly different payload shapes.
Then dedup becomes less about “same key” and more about “same intent”.
Curious if you’ve run into that or if your flows are mostly single-source?
mostly single-source for us — stripe and shopify where the event id stays stable. did hit the multi-layer thing once with a custom batch pipeline where the same event came from the provider retry and our dead-letter queue with slightly different wrapping. ended up normalizing before the gate: strip the envelope fields, hash only the stable business identifiers. once you treat it as schema normalization instead of key matching its much cleaner, but only worth the complexity if you’re running multiple concurrent retry paths
yeah, we hit that too — mostly with state drift. the gate handles “has this event been processed”, but by the time a retry arrives the underlying resource might have moved on. cancelled order, already-refunded payment, user already off-boarded. we added a state check after the gate as a separate step: pull current state, validate preconditions, then run. cleaner to keep the two concerns separate rather than baking both into the dedup logic.
that’s a more complete solution than what I have — adding a context-aware decision layer on top of dedup is where things actually get interesting. curious what you use as the “context” signal there — state from prior runs, or something external like a status field? will report back once this has run in prod for a bit.
makes sense to keep it lightweight — if the gate adds latency, the whole premise breaks. the “given what just happened” framing is basically a minimal FSM check, which is usually enough for the common failure cases anyway. will report back, our setup has some interesting edge cases around partial retries so should be a decent stress test.