What are the key fairness metrics for evaluating AI models in finance, and why can't a model satisfy all fairness criteria simultaneously?

Question

AcadiFi · Accepted Answer

Model fairness metrics quantify whether an AI system treats different demographic groups equitably. The challenge is that multiple mathematically reasonable definitions of fairness are mutually incompatible — a result with profound implications for financial model governance.

**Three Core Metrics:**

1. **Demographic Parity (Statistical Parity):**
P(approved | Group A) = P(approved | Group B)

Requires equal approval rates across groups regardless of qualification differences. A credit model approving 65% of Group A and 65% of Group B satisfies this metric.

2. **Equalized Odds (Equal Opportunity):**
P(approved | qualified, Group A) = P(approved | qualified, Group B)
AND
P(approved | unqualified, Group A) = P(approved | unqualified, Group B)

Requires equal true positive and false positive rates across groups. Among genuinely creditworthy applicants, approval rates must be equal; among non-creditworthy applicants, false approval rates must also be equal.

3. **Predictive Parity (Calibration):**
P(repays | approved, Group A) = P(repays | approved, Group B)

Among approved applicants, the actual repayment rate must be equal across groups. If 90% of approved Group A borrowers repay, then 90% of approved Group B borrowers must also repay.

**The Impossibility Result:**

Vaultstone Credit tested their model on 50,000 applications:

| Metric | Group A (n=30K) | Group B (n=20K) | Equal? |
|---|---|---|---|
| Base default rate | 5% | 12% | No |
| Approval rate | 78% | 53% | No (fails demographic parity) |
| True positive rate | 94% | 91% | Close (near equalized odds) |
| Repayment rate of approved | 96% | 95% | Close (near predictive parity) |

When Vaultstone forced demographic parity (78% approval for both groups), the approved Group B pool now included higher-risk applicants, pushing their repayment rate to 88% versus 96% for Group A — violating predictive parity. Fixing one metric broke another.

**Mathematical reason:** When base rates (actual default rates) differ between groups, it is mathematically impossible to simultaneously achieve demographic parity, equalized odds, and predictive parity. This was proven formally and is known as the impossibility theorem of fairness.

**Practical Implications for CFA Practitioners:**
- Choose the fairness metric most appropriate for the use case and regulatory context
- Document the chosen metric and rationale transparently
- Disclose which fairness criteria are not satisfied and why
- Involve diverse stakeholders in the metric selection decision
- Regularly re-evaluate as regulations and societal expectations evolve

Deepen your understanding of AI ethics in our CFA Ethics course.

What are the key fairness metrics for evaluating AI models in finance, and why can't a model satisfy all fairness criteria simultaneously?

Master Level I with our CFA Course

Related Questions

Practice Questions