How do AIC and BIC work for model selection, and when would they disagree?
For FRM Part I, I need to understand information criteria for choosing between competing models. I know AIC and BIC both penalize model complexity, but I'm unclear on the mechanics. Can someone explain the formulas and when they'd pick different models?
AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) are model selection tools that balance goodness-of-fit against model complexity. Both prevent overfitting by penalizing extra parameters.
The Formulas:
AIC = -2 x ln(L) + 2k
BIC = -2 x ln(L) + k x ln(n)
Where:
- L = maximized likelihood of the model
- k = number of estimated parameters
- n = number of observations
- Lower values are better for both
Key Difference — Penalty Strength:
BIC's penalty is k x ln(n), while AIC's is 2k. When n > 7 (since ln(7) = 1.95 ~ 2), BIC penalizes complexity MORE heavily. For typical financial datasets (n = 250+ daily observations), BIC is far stricter:
| n | AIC Penalty per Param | BIC Penalty per Param |
|---|---|---|
| 50 | 2.0 | 3.91 |
| 250 | 2.0 | 5.52 |
| 1000 | 2.0 | 6.91 |
Example — Ridgeport Quant Research:
Compare two GARCH models for S&P 500 volatility (n = 1,000 daily returns):
| Model | Parameters (k) | Log-Likelihood | AIC | BIC |
|---|---|---|---|---|
| GARCH(1,1) | 3 | -1,425.3 | 2,856.6 | 2,877.4 |
| GARCH(2,2) | 5 | -1,423.1 | 2,856.2 | 2,891.7 |
- AIC prefers GARCH(2,2) (2,856.2 < 2,856.6) — the small fit improvement justifies extra params
- BIC prefers GARCH(1,1) (2,877.4 < 2,891.7) — the heavier penalty rejects the extra complexity
When They Disagree:
This happens precisely when a more complex model offers a modest improvement in fit:
- AIC (lighter penalty) accepts the extra parameters
- BIC (heavier penalty) rejects them
In practice, BIC is consistent (it selects the true model if one exists in the candidate set), while AIC tends to select models with better prediction accuracy. For risk management, BIC is often preferred because overfitting is dangerous — an overfit VaR model will fail precisely when you need it most.
FRM Exam Tips:
- Both criteria: lower = better
- BIC penalizes more harshly for large samples
- If asked which favors parsimony: BIC
- Adjusted R-squared only works for nested linear models; AIC/BIC work for any MLE-based model
Test your model selection skills in our FRM Part I question bank.
Master Part I with our FRM Course
64 lessons · 120+ hours· Expert instruction
Related Questions
How exactly do futures margin calls work, and what happens if I can't meet one?
How do you calculate the settlement amount on a Forward Rate Agreement (FRA)?
When should I use Monte Carlo simulation instead of parametric VaR, and how does it actually work?
Parametric VaR vs. Historical Simulation VaR — when does each method fail?
What are the core components of an Enterprise Risk Management (ERM) framework, and how does it differ from siloed risk management?
Join the Discussion
Ask questions and get expert answers.