How does elastic net combine L1 and L2 penalties, and when does it outperform pure LASSO or ridge?

Question

AcadiFi · Accepted Answer

Elastic net combines the L1 (LASSO) and L2 (ridge) penalties through a mixing parameter alpha, inheriting LASSO's variable selection ability and ridge's stability with correlated predictors. It overcomes specific weaknesses that each method has individually.

**Objective Function:**

Elastic net minimizes: sum of (y_i - X_i x beta)^2 + lambda x [alpha x sum|beta_j| + (1 - alpha) x sum(beta_j^2)]

- alpha = 1: pure LASSO
- alpha = 0: pure ridge
- 0 < alpha < 1: elastic net blend

**Why Pure LASSO Fails with Correlated Predictors:**

When two predictors are highly correlated (say, rho > 0.9), LASSO arbitrarily selects one and zeros out the other. This is problematic when both variables carry meaningful information. Ridge keeps both but cannot eliminate truly irrelevant variables.

**Worked Example:**

Analyst Tobias at Riverdale Quant models credit default probability using 20 financial ratios. Several ratios are grouped by category (three leverage ratios are correlated at 0.85+, four profitability ratios at 0.90+).

| Method | Variables Selected | CV Error (bps) |
|---|---|---|
| OLS (all 20) | 20 | 145 |
| Ridge (lambda = 3.1) | 20 (all nonzero) | 98 |
| LASSO (lambda = 1.8) | 6 | 87 |
| Elastic Net (alpha = 0.5, lambda = 2.2) | 9 | 72 |

LASSO picks one leverage ratio and one profitability ratio, discarding the others arbitrarily. Elastic net retains two leverage ratios and two profitability ratios that are genuinely predictive, while still zeroing out the 11 noise variables.

**Grouped Selection Property:**

Elastic net's signature advantage is grouped selection: it tends to include or exclude correlated variables together rather than making arbitrary choices among them. This produces more stable and interpretable models.

**Tuning:**

Elastic net requires tuning two hyperparameters (alpha and lambda), typically via a 2D grid search with cross-validation:

1. Create a grid of alpha values (e.g., 0.1, 0.3, 0.5, 0.7, 0.9)
2. For each alpha, find optimal lambda via K-fold cross-validation
3. Select the (alpha, lambda) pair with lowest CV error

**When to Use Each Method:**
- **Ridge:** Many predictors, all relevant, high multicollinearity
- **LASSO:** Many predictors, most are noise, limited correlation among signal variables
- **Elastic Net:** Many predictors, groups of correlated variables, need both selection and stability

Practice regularization comparisons in our CFA Quantitative Methods question bank.

How does elastic net combine L1 and L2 penalties, and when does it outperform pure LASSO or ridge?

Master Level II with our CFA Course

Related Questions

Practice Questions