A
AcadiFi
FJ
FairnessAudit_Jia2026-04-11
cfaLevel IEthicsPortfolio Management

What are the key fairness metrics for evaluating AI models in finance, and why can't a model satisfy all fairness criteria simultaneously?

In my CFA ethics studies on AI governance, I've encountered several fairness definitions: demographic parity, equalized odds, predictive parity. They all sound reasonable but apparently you can't achieve all of them at once. Can someone explain what each measures and why the impossibility result matters for financial practitioners?

115 upvotes
Verified ExpertVerified Expert
AcadiFi Certified Professional

Model fairness metrics quantify whether an AI system treats different demographic groups equitably. The challenge is that multiple mathematically reasonable definitions of fairness are mutually incompatible — a result with profound implications for financial model governance.\n\nThree Core Metrics:\n\n1. Demographic Parity (Statistical Parity):\nP(approved | Group A) = P(approved | Group B)\n\nRequires equal approval rates across groups regardless of qualification differences. A credit model approving 65% of Group A and 65% of Group B satisfies this metric.\n\n2. Equalized Odds (Equal Opportunity):\nP(approved | qualified, Group A) = P(approved | qualified, Group B)\nAND\nP(approved | unqualified, Group A) = P(approved | unqualified, Group B)\n\nRequires equal true positive and false positive rates across groups. Among genuinely creditworthy applicants, approval rates must be equal; among non-creditworthy applicants, false approval rates must also be equal.\n\n3. Predictive Parity (Calibration):\nP(repays | approved, Group A) = P(repays | approved, Group B)\n\nAmong approved applicants, the actual repayment rate must be equal across groups. If 90% of approved Group A borrowers repay, then 90% of approved Group B borrowers must also repay.\n\nThe Impossibility Result:\n\nVaultstone Credit tested their model on 50,000 applications:\n\n| Metric | Group A (n=30K) | Group B (n=20K) | Equal? |\n|---|---|---|---|\n| Base default rate | 5% | 12% | No |\n| Approval rate | 78% | 53% | No (fails demographic parity) |\n| True positive rate | 94% | 91% | Close (near equalized odds) |\n| Repayment rate of approved | 96% | 95% | Close (near predictive parity) |\n\nWhen Vaultstone forced demographic parity (78% approval for both groups), the approved Group B pool now included higher-risk applicants, pushing their repayment rate to 88% versus 96% for Group A — violating predictive parity. Fixing one metric broke another.\n\nMathematical reason: When base rates (actual default rates) differ between groups, it is mathematically impossible to simultaneously achieve demographic parity, equalized odds, and predictive parity. This was proven formally and is known as the impossibility theorem of fairness.\n\nPractical Implications for CFA Practitioners:\n- Choose the fairness metric most appropriate for the use case and regulatory context\n- Document the chosen metric and rationale transparently\n- Disclose which fairness criteria are not satisfied and why\n- Involve diverse stakeholders in the metric selection decision\n- Regularly re-evaluate as regulations and societal expectations evolve\n\nDeepen your understanding of AI ethics in our CFA Ethics course.

📊

Master Level I with our CFA Course

107 lessons · 200+ hours· Expert instruction

#fairness-metrics#demographic-parity#equalized-odds#predictive-parity#impossibility-theorem