What are the key fairness metrics for evaluating AI models in finance, and why can't a model satisfy all fairness criteria simultaneously?
In my CFA ethics studies on AI governance, I've encountered several fairness definitions: demographic parity, equalized odds, predictive parity. They all sound reasonable but apparently you can't achieve all of them at once. Can someone explain what each measures and why the impossibility result matters for financial practitioners?
Model fairness metrics quantify whether an AI system treats different demographic groups equitably. The challenge is that multiple mathematically reasonable definitions of fairness are mutually incompatible — a result with profound implications for financial model governance.\n\nThree Core Metrics:\n\n1. Demographic Parity (Statistical Parity):\nP(approved | Group A) = P(approved | Group B)\n\nRequires equal approval rates across groups regardless of qualification differences. A credit model approving 65% of Group A and 65% of Group B satisfies this metric.\n\n2. Equalized Odds (Equal Opportunity):\nP(approved | qualified, Group A) = P(approved | qualified, Group B)\nAND\nP(approved | unqualified, Group A) = P(approved | unqualified, Group B)\n\nRequires equal true positive and false positive rates across groups. Among genuinely creditworthy applicants, approval rates must be equal; among non-creditworthy applicants, false approval rates must also be equal.\n\n3. Predictive Parity (Calibration):\nP(repays | approved, Group A) = P(repays | approved, Group B)\n\nAmong approved applicants, the actual repayment rate must be equal across groups. If 90% of approved Group A borrowers repay, then 90% of approved Group B borrowers must also repay.\n\nThe Impossibility Result:\n\nVaultstone Credit tested their model on 50,000 applications:\n\n| Metric | Group A (n=30K) | Group B (n=20K) | Equal? |\n|---|---|---|---|\n| Base default rate | 5% | 12% | No |\n| Approval rate | 78% | 53% | No (fails demographic parity) |\n| True positive rate | 94% | 91% | Close (near equalized odds) |\n| Repayment rate of approved | 96% | 95% | Close (near predictive parity) |\n\nWhen Vaultstone forced demographic parity (78% approval for both groups), the approved Group B pool now included higher-risk applicants, pushing their repayment rate to 88% versus 96% for Group A — violating predictive parity. Fixing one metric broke another.\n\nMathematical reason: When base rates (actual default rates) differ between groups, it is mathematically impossible to simultaneously achieve demographic parity, equalized odds, and predictive parity. This was proven formally and is known as the impossibility theorem of fairness.\n\nPractical Implications for CFA Practitioners:\n- Choose the fairness metric most appropriate for the use case and regulatory context\n- Document the chosen metric and rationale transparently\n- Disclose which fairness criteria are not satisfied and why\n- Involve diverse stakeholders in the metric selection decision\n- Regularly re-evaluate as regulations and societal expectations evolve\n\nDeepen your understanding of AI ethics in our CFA Ethics course.
Master Level I with our CFA Course
107 lessons · 200+ hours· Expert instruction
Related Questions
What are the most reliable candlestick reversal patterns, and how should CFA candidates interpret them in context?
What are the CFA Standards requirements for research reports, and what must be disclosed versus recommended?
How does IAS 41 require biological assets to be measured, and what happens when fair value cannot be reliably determined?
Under IFRIC 12, how should a company account for a service concession arrangement, and what determines whether the intangible or financial asset model applies?
What is the investment entities exception under IFRS 10, and why are some parents exempt from consolidating their subsidiaries?
Join the Discussion
Ask questions and get expert answers.