Our model gives every employee a score 0–100%. We need a threshold above which we alert "at risk." A low threshold catches most leavers but floods managers with false alarms. A high threshold cuts noise but misses real leavers.
AUC-ROC measures model quality across ALL thresholds at once. It answers: "If I pick a random leaver and a random stayer, how often does the model rank the leaver higher?"
True Positive Rate (TPR) = Sensitivity = Recall
Of all employees who actually left, what fraction did we correctly flag?
TPR = True Positives divided by All Actual Leavers
"What % of leavers did we warn about?"
False Positive Rate (FPR) = 1 minus Specificity
Of all employees who actually stayed, what fraction did we wrongly flag?
FPR = False Positives divided by All Actual Stayers
"What % of stayers got a wasted alert?"
The accuracy trap
If 85% of employees stay, a model that always predicts "will stay" scores 85% accuracy and catches zero leavers. AUC-ROC is immune because it measures ranking quality, not raw correctness.
1,000 employees. 150 actually leave, 850 stay. At a 70% threshold:
Coin flip
Simple baseline
Strong model
AUC = 0.82 in plain English
Pick any leaver and any stayer at random. There is an 82% chance our model ranked the leaver higher. A coin flip would be 50%. We are 32 points better than chance.
The model says Priya has 84% attrition risk. But why? Without a reason, a manager won't act, HR won't trust it, and the EU AI Act won't permit it.
SHAP (SHapley Additive exPlanations) uses cooperative game theory to fairly split credit for each prediction across every feature. It answers: "How much did each signal push Priya's score up or down from the average employee?"
Imagine salary gap, no promotion, and team attrition are three "players" sharing credit. Shapley (1953) proved the only fair split is to measure each player's marginal contribution across every possible ordering, then average. The result is exact and provably fair.
1. Efficiency -- always sums to target
All SHAP values add up exactly to (prediction minus base rate). No hidden credit.
2. Symmetry -- equal features get equal credit
Two features contributing identically receive identical SHAP values.
3. Dummy -- zero-impact = zero credit
A feature that never changes the output gets SHAP = 0. No noise.
Red bars push risk up. Green bars push risk down. Bar length equals that feature's exact SHAP contribution.
Feature weight says:
"Salary gap matters globally across all 5,000 employees"
SHAP says:
"Salary gap added +18% to Priya's score because she is 12% below market today"
Why: No salary revision in 22 months while her market rate rose ~14%. Two teammates left in 90 days.
Action: Schedule a 1:1 this week to discuss her compensation and growth path.
AUC-ROC -- the model builder's question
"Is the model good enough to trust?" -- Your quality gate, investor answer, benchmark vs Workday. Show to data scientists. Never to managers.
SHAP -- the user's question
"Why should I act on this?" -- What managers see. What HR trusts. What the EU AI Act requires. Turns a black-box number into an actionable case.
Feature engineering
Raw HRIS data into normalised signals: salary frustration index, contagion score, growth stagnation score.
XGBoost scores every employee
AUC-ROC validates ranking quality here. Platt scaling converts raw score to a calibrated probability.
TreeSHAP on every high-risk score
Runs in milliseconds. Every feature gets its exact attribution for this specific employee.
Claude translates SHAP to plain language
Top 3 SHAP values feed a prompt. Claude generates the manager nudge -- specific, contextual, actionable.
Manager acts, outcome tracked, model improves
Every intervention outcome becomes training data. The model gets smarter every week.
Q: "Can't you just use accuracy?"
No. With 15% attrition, a dummy model achieves 85% accuracy predicting everyone stays and catches zero leavers. AUC-ROC is the correct metric for imbalanced classification.
Q: "Why SHAP and not LIME?"
LIME is approximate and inconsistent -- the same employee can get different explanations on different runs. SHAP is exact. For employment decisions, consistency is non-negotiable.
Q: "Is 0.82 AUC good enough?"
Right frame: "Is 82% better than what HR does today?" Annual engagement surveys score ~0.55 AUC empirically. We are 27 points better, at scale, across thousands of employees.
Q: "What if the model is wrong?"
The SHAP explanation lets any manager override. Human-in-loop is the designed mode and the EU AI Act requirement.
On model quality:
"Our AUC of 0.82 means we rank a random leaver above a random stayer 82% of the time, versus 50% for a coin flip. Industry best-in-class is 0.75-0.80. We outperform through multi-source signal fusion and per-customer fine-tuning."
On explainability:
"Every score is decomposed using SHAP -- the only method providing mathematically fair, additive attributions per individual prediction. This satisfies EU AI Act Article 13 transparency requirements."