What “recovery-first” means in code
A failed charge entersDunningService.handleFailure(invoiceId, failureCode, rail). From there, every decision is made by DunningStrategy.decide() — a pure function in @duro/billing that takes the failure context and returns an action, never just an error.
This diagram is decide(), branch for branch. Every leaf is a DunningDecision { action, nextAttemptAt, rail, reason }. The reason string is human-readable and surfaces on the merchant’s recovery dashboard and in the dunning email.
Step 1 — classify why it failed
You cannot retry intelligently if you don’t know what went wrong.FailureCode.classify() maps the raw gateway code (and its numeric aliases) into a category:
| Category | Example codes | What it tells us |
|---|---|---|
insufficient_funds | insufficient_funds, 51 | The customer is broke right now. Timing is everything. |
expired_card | expired_card, 54 | The card is dead. Retrying it is pointless. |
card_not_supported | card_not_supported | Wrong instrument. Need a new card. |
do_not_honor | do_not_honor, 05 | Bank said no, often transiently. One more try, then move on. |
hard_decline | stolen_card, card_not_supported | Stop using this card entirely. |
processor_error | processor_error, timeout | Our side / the network. Pure transient. Back off and retry. |
requiresNewCard() (expired/unsupported → don’t retry, ask the customer) and isHardDecline() (stop hammering the card → switch rails).
Step 2 — the payday window (Nigeria-native)
insufficient_funds is the most common failure, and the naive response — retry tomorrow — is exactly wrong for a salaried customer with an empty account on the 15th. PaydayWindow encodes the reality:
- Payday is the 28th (
PAYDAY_ANCHOR_DAY = 28), with an early-month grace window of the 1st–3rd (EARLY_MONTH_DAYS = 3) for salaries that land just into the new month. - A failed charge that isn’t on payday is rescheduled to
nextPayday()— the upcoming 28th at 09:00 UTC. - Retrying a broke customer three times before their salary arrives just burns the attempt budget. Waiting for payday is the single highest-leverage retry decision in the Nigerian market.
paydayAware defaults on); a merchant billing businesses rather than salaried individuals can turn it off.
Step 3 — rail fallback
If the card is the problem (hard decline), the answer isn’t a better-timed card retry — it’s a different rail. The same customer who can’t pay by card can almost always pay another way.RAIL_FALLBACK = [ussd, transfer, virtual_account, direct_debit]. On a hard decline, the strategy advances to the next rail in the chain and the next attempt charges there. The merchant configures both the order and which rails are enabled. The recovery dashboard visualises this as a relay — card handing off to the rail the customer actually has.
Step 4 — the retry schedule
For transient failures (processor error, do-not-honour after the first card retry), the schedule is a classic exponential backoff, in hours:nextAttemptAt(base, attemptsMade, offsets) returns the next timestamp, or null once the offsets are exhausted — which the strategy reads as “give up.” Both the offsets and the max-attempts are per-merchant settings, so a business can make recovery as patient or as aggressive as it likes.
The recovery state machine
A failing invoice gets exactly oneDunningSchedule, and it walks its own small state machine in lockstep with the subscription:
On recovery (recovered), the service does the thing that keeps billing-period accounting honest: it advances the subscription’s currentPeriodStart/End to the recovered invoice’s period, transitions past_due → active, and emits both subscription_recovered and subscription_payment_recovered. The customer who paid late ends up exactly where a customer who paid on time would — no skipped period, no double charge.
On exhaustion, the invoice becomes uncollectible, the subscription transitions past_due → unpaid, and dunning stops. The money is written off, visibly, on the dashboard.
The recovery ledger
Every schedule contributes to a liveRecoverySummary the merchant sees as their hero metric:
recoveredRevenue and atRiskRevenue are summed in the database (a relation-filtered SUM over invoices), not by loading rows into Node — so the dashboard stays cheap as volume grows. This is the number that replaces “total revenue” at the top of the merchant’s screen: what you almost lost, and got back.
Every knob is the merchant’s
The whole engine reads from a per-tenantStoreSettings row, so recovery behaviour is configured, not hard-coded:
| Setting | Effect |
|---|---|
dunningEnabled | Master switch. Off → failures mark past-due but schedule no retries. |
maxAttempts | How many tries before exhaustion. |
retryOffsetsHours | The backoff curve. |
paydayAware | Whether to wait for payday on insufficient-funds. |
retryRails | The fallback chain and which rails are enabled. |
dunningEscalation | What “out of attempts” means: cancel, pause, or mark unpaid. |
reminderSequence | The dunning email cadence (day 0/4/8/15…). |
dunningEnabled off pauses retries and the scanner skips the schedule; custom offsets and max-attempts drive the schedule that’s written.
Next: payday & rails for the worker mechanics, then money in.