The 90-day headcount experiment
A mid-market insurer replaced most of its service team with a bot in one quarter. The rehiring cost more than the software saved — a pattern the market keeps rerunning.
One quarter, one memo
The service organization of a mid-market insurer went from 120 seats to 38 in ninety days. The decision rested on a vendor benchmark deck, a board eager for an efficiency story, and a deflection rate from a six-week trial that was, in fairness, real. The bot did handle the easy half of tickets — the half that scripted macros had already half-solved.
Nothing about the memo was reckless on its face. The numbers were sourced, the vendor was reputable, the trial had run. What the ninety days did not contain was time to learn which half of the ticket volume the trial had actually measured.
The long tail bites
The tickets that remained were the hard ones: ambiguous, emotional, regulatory. Deflection metrics stayed green while resolution times and satisfaction quietly sank — the dashboard measured what the bot took on, not what happened to what it couldn’t. Complaints found their way to the regulator before they found their way into a KPI.
The deeper loss was internal. The remaining senior staff became the escalation dead-end for every conversation the bot had already made worse, and they left first. With them went the institutional knowledge that any future automation would have needed as its baseline.
Rehiring at a premium
Fourteen months later the function was rebuilt to roughly 80 seats — at contractor rates, with retraining costs, in a trust business now carrying visible scar tissue. Netted out, the software saved less in its first year than the reversal cost in its second. The original benchmark deck was never revisited; the reversal never made a slide.
This is the pattern our reversal indicator counts, and it has been rising for eight straight quarters. Each individual case reads as bad luck or bad vendors. In aggregate it reads as what it is: a sequencing error, repeated at market scale.
What the pattern teaches
The bot was not the failure. Deflection on the easy half was real, and a staged rollout would have banked it. The failure was ordering: the cut came before the evidence, which meant the experiment destroyed its own control group. A workflow with no human baseline left to compare against can never prove what the automation is worth — only what its absence costs.
Cut-first automation also forecloses the cheap exit. A firm that automates alongside its team can retreat in a sprint; a firm that has already run the severance round can only retreat through a hiring market, at whatever price that market sets.
Takeaways
- 01Trial metrics describe the traffic the trial saw — the easy half. Size the decision on the residual work, not the deflection rate.
- 02Cuts made before a measured baseline exists can never be evaluated afterwards; the control group is gone.
- 03Escalation is a capability, not an overhead. The people who absorb what the bot cannot are the ones a cut-first plan removes.
- 04Model the reversal cost in the business case. If the firm cannot afford to be wrong, it cannot afford the cut.
What would have worked
- Automate first, measure against the human baseline for two quarters, then re-plan roles around the residual work.
- Keep the escalation path staffed and senior — the bot’s job is to make the humans more senior, not gone.
- Tie any headcount change to measured resolution quality, not to deflection volume.
Cases are anonymized composites: patterns assembled from public filings, court records, interviews and post-mortems, with identifying details changed. We analyze patterns, not people.