> ## Documentation Index
> Fetch the complete documentation index at: https://docs.useduro.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Recovery & Dunning

> The subsystem Duro is built around. Classify the failure, choose an action, retry on the right rail at the right time — and count every naira back.

This is the chapter that justifies the whole project. When a charge fails, most platforms write a log line. Duro runs a **decision engine**, schedules intelligent retries, switches rails, sequences emails, and surfaces a live recovery ledger. If you read one page deeply, read this one.

## What "recovery-first" means in code

A failed charge enters `DunningService.handleFailure(invoiceId, failureCode, rail)`. From there, every decision is made by `DunningStrategy.decide()` — a pure function in `@duro/billing` that takes the failure context and returns an *action*, never just an error.

```mermaid theme={null}
flowchart TD
    F["charge failed<br/>(code, rail, attemptsMade)"] --> EX{"attemptsMade ≥<br/>maxAttempts?"}
    EX -->|"yes"| EXHAUST["action: exhaust<br/>→ invoice uncollectible,<br/>subscription unpaid"]
    EX -->|"no"| CARD{"requires a new card?<br/>(expired, unsupported)"}
    CARD -->|"yes"| UPDATE["action: request_card_update<br/>→ pause, email the customer<br/>a secure update link"]
    CARD -->|"no"| HARD{"hard decline?<br/>(stolen, do_not_honor…)"}
    HARD -->|"yes, attempt 1"| RETRY1["action: retry once on card<br/>(do_not_honor is often transient)"]
    HARD -->|"yes, later"| SWITCH["action: switch_rail<br/>→ next rail in the fallback chain"]
    HARD -->|"no"| FUNDS{"insufficient funds<br/>+ payday-aware<br/>+ not payday?"}
    FUNDS -->|"yes"| PAYDAY["action: retry_payday<br/>→ schedule for next payday"]
    FUNDS -->|"no"| RETRY["action: retry<br/>→ exponential backoff"]
```

This diagram is `decide()`, branch for branch. Every leaf is a `DunningDecision { action, nextAttemptAt, rail, reason }`. The reason string is human-readable and surfaces on the merchant's recovery dashboard and in the dunning email.

## Step 1 — classify *why* it failed

You cannot retry intelligently if you don't know what went wrong. `FailureCode.classify()` maps the raw gateway code (and its numeric aliases) into a category:

| Category             | Example codes                       | What it tells us                                             |
| -------------------- | ----------------------------------- | ------------------------------------------------------------ |
| `insufficient_funds` | `insufficient_funds`, `51`          | The customer is broke *right now*. Timing is everything.     |
| `expired_card`       | `expired_card`, `54`                | The card is dead. Retrying it is pointless.                  |
| `card_not_supported` | `card_not_supported`                | Wrong instrument. Need a new card.                           |
| `do_not_honor`       | `do_not_honor`, `05`                | Bank said no, often transiently. One more try, then move on. |
| `hard_decline`       | `stolen_card`, `card_not_supported` | Stop using this card entirely.                               |
| `processor_error`    | `processor_error`, `timeout`        | Our side / the network. Pure transient. Back off and retry.  |

Two predicates ride on the category: `requiresNewCard()` (expired/unsupported → don't retry, ask the customer) and `isHardDecline()` (stop hammering the card → switch rails).

## Step 2 — the payday window (Nigeria-native)

`insufficient_funds` is the most common failure, and the naive response — retry tomorrow — is exactly wrong for a salaried customer with an empty account on the 15th. `PaydayWindow` encodes the reality:

```mermaid theme={null}
flowchart LR
    FAIL["insufficient_funds<br/>on the 15th"] --> CHECK{"PaydayWindow.isPayday(now)?<br/>day ≥ 28 OR day ≤ 3"}
    CHECK -->|"no"| WAIT["nextPayday() → the 28th, 09:00 UTC<br/>retry then, not before"]
    CHECK -->|"yes"| NOW["money likely landed —<br/>retry now on backoff"]
```

* **Payday is the 28th** (`PAYDAY_ANCHOR_DAY = 28`), with an early-month grace window of the 1st–3rd (`EARLY_MONTH_DAYS = 3`) for salaries that land just into the new month.
* A failed charge that *isn't* on payday is rescheduled to `nextPayday()` — the upcoming 28th at 09:00 UTC.
* Retrying a broke customer three times before their salary arrives just burns the attempt budget. Waiting for payday is the single highest-leverage retry decision in the Nigerian market.

This is configurable per merchant (`paydayAware` defaults on); a merchant billing businesses rather than salaried individuals can turn it off.

## Step 3 — rail fallback

If the **card** is the problem (hard decline), the answer isn't a better-timed card retry — it's a different rail. The same customer who can't pay by card can almost always pay another way.

```mermaid theme={null}
flowchart LR
    CARD["card ✗"] --> USSD["USSD"]
    USSD --> XFER["bank transfer"]
    XFER --> VA["virtual account"]
    VA --> DD["direct debit"]
    style CARD fill:#fee,stroke:#c33
    style USSD fill:#efe,stroke:#3c3
```

`RAIL_FALLBACK = [ussd, transfer, virtual_account, direct_debit]`. On a hard decline, the strategy advances to the next rail in the chain and the next attempt charges *there*. The merchant configures both the order and which rails are enabled. The recovery dashboard visualises this as a relay — card handing off to the rail the customer actually has.

## Step 4 — the retry schedule

For transient failures (processor error, do-not-honour after the first card retry), the schedule is a classic exponential backoff, in hours:

```
RetrySchedule.DEFAULT_OFFSETS_HOURS = [0, 24, 72, 120, 168]   // now, +1d, +3d, +5d, +7d
MAX_ATTEMPTS = 5
```

`nextAttemptAt(base, attemptsMade, offsets)` returns the next timestamp, or `null` once the offsets are exhausted — which the strategy reads as "give up." Both the offsets and the max-attempts are **per-merchant settings**, so a business can make recovery as patient or as aggressive as it likes.

## The recovery state machine

A failing invoice gets exactly one `DunningSchedule`, and it walks its own small state machine in lockstep with the subscription:

```mermaid theme={null}
stateDiagram-v2
    [*] --> scheduled: handleFailure
    scheduled --> in_flight: a retry is processing
    in_flight --> recovered: charge succeeded
    in_flight --> scheduled: failed, more attempts
    in_flight --> exhausted: failed, out of attempts
    scheduled --> paused: needs card update
    paused --> [*]: customer updates card
    recovered --> [*]
    exhausted --> [*]
```

<a id="recover" />

**On recovery** (`recovered`), the service does the thing that keeps billing-period accounting honest: it advances the subscription's `currentPeriodStart/End` to the **recovered invoice's** period, transitions `past_due → active`, and emits both `subscription_recovered` and `subscription_payment_recovered`. The customer who paid late ends up exactly where a customer who paid on time would — no skipped period, no double charge.

**On exhaustion**, the invoice becomes `uncollectible`, the subscription transitions `past_due → unpaid`, and dunning stops. The money is written off, visibly, on the dashboard.

## The recovery ledger

Every schedule contributes to a live `RecoverySummary` the merchant sees as their hero metric:

```mermaid theme={null}
flowchart LR
    SCHEDULES["dunning_schedules<br/>grouped by state + failureCode"] --> SUMMARY["RecoverySummary"]
    SUMMARY --> M1["inFlight — money at risk"]
    SUMMARY --> M2["recovered — money brought back"]
    SUMMARY --> M3["exhausted — money lost"]
    SUMMARY --> M4["recoveryRate = recovered / (recovered + exhausted)"]
    SUMMARY --> M5["byFailureCode — what's killing you"]
```

`recoveredRevenue` and `atRiskRevenue` are summed **in the database** (a relation-filtered `SUM` over invoices), not by loading rows into Node — so the dashboard stays cheap as volume grows. This is the number that replaces "total revenue" at the top of the merchant's screen: *what you almost lost, and got back.*

## Every knob is the merchant's

The whole engine reads from a per-tenant `StoreSettings` row, so recovery behaviour is configured, not hard-coded:

| Setting             | Effect                                                               |
| ------------------- | -------------------------------------------------------------------- |
| `dunningEnabled`    | Master switch. Off → failures mark past-due but schedule no retries. |
| `maxAttempts`       | How many tries before exhaustion.                                    |
| `retryOffsetsHours` | The backoff curve.                                                   |
| `paydayAware`       | Whether to wait for payday on insufficient-funds.                    |
| `retryRails`        | The fallback chain and which rails are enabled.                      |
| `dunningEscalation` | What "out of attempts" means: cancel, pause, or mark unpaid.         |
| `reminderSequence`  | The dunning email cadence (day 0/4/8/15…).                           |

A change to these flows straight into the next decision — verified live: flipping `dunningEnabled` off pauses retries and the scanner skips the schedule; custom offsets and max-attempts drive the schedule that's written.

Next: [payday & rails](/billing/payday-and-rails) for the worker mechanics, then [money in](/payments/checkout).