What is regularization in machine learning, and why should CFA candidates care about LASSO vs. Ridge?

Question

AcadiFi · Accepted Answer

Regularization is one of the most practical additions to the CFA curriculum. Here's why it matters for finance and how to think about it.

## The Problem with OLS in Finance

Ordinary least squares (OLS) regression minimizes the sum of squared residuals — and nothing else. In finance, this creates two problems:

1. **Overfitting:** With many potential predictors (P/E, P/B, momentum, volatility, dividend yield, earnings revisions, etc.), OLS fits the training data perfectly but performs terribly out of sample. The model captures noise, not signal.

2. **Multicollinearity:** Financial variables are often highly correlated (e.g., P/E and earnings yield are near-perfect inverses). OLS coefficient estimates become unstable and uninterpretable.

## How Regularization Helps

Regularization adds a **penalty term** to the OLS objective function that discourages large coefficient values:

**OLS:** Minimize Sum(residuals^2)
**Ridge:** Minimize Sum(residuals^2) + lambda x Sum(beta_j^2)
**LASSO:** Minimize Sum(residuals^2) + lambda x Sum(|beta_j|)

The parameter **lambda** controls the strength of the penalty. Higher lambda = more shrinkage = simpler model.

## Ridge vs. LASSO

| Feature | Ridge (L2) | LASSO (L1) |
|---|---|---|
| Penalty type | Sum of squared coefficients | Sum of absolute coefficients |
| Coefficient behavior | Shrinks toward zero but never exactly zero | Can force coefficients to exactly zero |
| Variable selection | No — keeps all predictors | Yes — eliminates irrelevant ones |
| Multicollinearity | Handles well (distributes weight) | Picks one from correlated group, zeros others |
| Best when | Many small effects | Few important predictors |

## Finance Example: Cross-Sectional Equity Return Prediction

**Setup:** You're building a model to predict next-month stock returns for 500 stocks in the Russell 1000 using 25 candidate factors.

**Using OLS:** You run the regression and get coefficients for all 25 factors. In-sample R-squared looks impressive at 12%. Out-of-sample? The model actually has *negative* predictive power. It memorized noise.

**Using Ridge (lambda = 0.5):** All 25 coefficients are shrunk toward zero. The model is more stable — out-of-sample R-squared improves to 1.8%. Not exciting, but positive.

**Using LASSO (lambda = 0.3):** The model zeroes out 18 of the 25 factors and keeps 7: earnings momentum, short interest, book-to-market, 12-month momentum (excluding last month), earnings surprise, analyst revision breadth, and accruals. Out-of-sample R-squared: 2.3%.

The LASSO result is more interpretable — you know exactly which factors matter — and slightly more powerful because it eliminated noise factors.

## How Lambda Is Chosen

**Cross-validation** is the standard approach:
1. Split your data into K folds (typically 5 or 10)
2. For each candidate lambda, train on K-1 folds and test on the held-out fold
3. Average the out-of-sample error across all folds
4. Choose the lambda that minimizes the average out-of-sample error

## Elastic Net (Bonus)

The CFA curriculum also mentions **Elastic Net**, which combines both penalties:

Minimize Sum(residuals^2) + lambda_1 x Sum(|beta_j|) + lambda_2 x Sum(beta_j^2)

This gives you LASSO's variable selection plus Ridge's stability with correlated predictors. It's increasingly popular in quantitative investment strategies.

## What the Exam Tests

Expect conceptual questions: "Which technique performs variable selection?" (LASSO). "What happens to bias and variance as lambda increases?" (Bias increases, variance decreases). You may also see questions about interpreting cross-validation results.

For hands-on ML in finance tutorials and Level II quant practice, check out AcadiFi's CFA Level II Quantitative Methods module.

What is regularization in machine learning, and why should CFA candidates care about LASSO vs. Ridge?

Master Level II with our CFA Course

Related Questions

Practice Questions