Regularization is one of the most practical additions to the CFA curriculum. Here's why it matters for finance and how to think about it.
## The Problem with OLS in Finance
Ordinary least squares (OLS) regression minimizes the sum of squared residuals — and nothing else. In finance, this creates two problems:
1. **Overfitting:** With many potential predictors (P/E, P/B, momentum, volatility, dividend yield, earnings revisions, etc.), OLS fits the training data perfectly but performs terribly out of sample. The model captures noise, not signal.
2. **Multicollinearity:** Financial variables are often highly correlated (e.g., P/E and earnings yield are near-perfect inverses). OLS coefficient estimates become unstable and uninterpretable.
## How Regularization Helps
Regularization adds a **penalty term** to the OLS objective function that discourages large coefficient values:
**OLS:** Minimize Sum(residuals^2)
**Ridge:** Minimize Sum(residuals^2) + lambda x Sum(beta_j^2)
**LASSO:** Minimize Sum(residuals^2) + lambda x Sum(|beta_j|)
The parameter **lambda** controls the strength of the penalty. Higher lambda = more shrinkage = simpler model.
## Ridge vs. LASSO
| Feature | Ridge (L2) | LASSO (L1) |
|---|---|---|
| Penalty type | Sum of squared coefficients | Sum of absolute coefficients |
| Coefficient behavior | Shrinks toward zero but never exactly zero | Can force coefficients to exactly zero |
| Variable selection | No — keeps all predictors | Yes — eliminates irrelevant ones |
| Multicollinearity | Handles well (distributes weight) | Picks one from correlated group, zeros others |
| Best when | Many small effects | Few important predictors |
## Finance Example: Cross-Sectional Equity Return Prediction
**Setup:** You're building a model to predict next-month stock returns for 500 stocks in the Russell 1000 using 25 candidate factors.
**Using OLS:** You run the regression and get coefficients for all 25 factors. In-sample R-squared looks impressive at 12%. Out-of-sample? The model actually has *negative* predictive power. It memorized noise.
**Using Ridge (lambda = 0.5):** All 25 coefficients are shrunk toward zero. The model is more stable — out-of-sample R-squared improves to 1.8%. Not exciting, but positive.
**Using LASSO (lambda = 0.3):** The model zeroes out 18 of the 25 factors and keeps 7: earnings momentum, short interest, book-to-market, 12-month momentum (excluding last month), earnings surprise, analyst revision breadth, and accruals. Out-of-sample R-squared: 2.3%.
The LASSO result is more interpretable — you know exactly which factors matter — and slightly more powerful because it eliminated noise factors.
## How Lambda Is Chosen
**Cross-validation** is the standard approach:
1. Split your data into K folds (typically 5 or 10)
2. For each candidate lambda, train on K-1 folds and test on the held-out fold
3. Average the out-of-sample error across all folds
4. Choose the lambda that minimizes the average out-of-sample error
## Elastic Net (Bonus)
The CFA curriculum also mentions **Elastic Net**, which combines both penalties:
Minimize Sum(residuals^2) + lambda_1 x Sum(|beta_j|) + lambda_2 x Sum(beta_j^2)
This gives you LASSO's variable selection plus Ridge's stability with correlated predictors. It's increasingly popular in quantitative investment strategies.
## What the Exam Tests
Expect conceptual questions: "Which technique performs variable selection?" (LASSO). "What happens to bias and variance as lambda increases?" (Bias increases, variance decreases). You may also see questions about interpreting cross-validation results.
For hands-on ML in finance tutorials and Level II quant practice, check out AcadiFi's CFA Level II Quantitative Methods module.