A
AcadiFi
RH
RidgeShrink_Haruki2026-04-08
cfaLevel IIQuantitative Methods

How does ridge regression use an L2 penalty to handle multicollinearity, and how do you choose the penalty parameter?

I understand that OLS can produce unstable coefficients when predictors are highly correlated. Ridge regression supposedly fixes this by adding a penalty term. But what does the L2 penalty actually do to the coefficient estimates geometrically? And how does lambda control the tradeoff between bias and variance?

99 upvotes
Verified ExpertVerified Expert
AcadiFi Certified Professional

Ridge regression adds a squared-magnitude penalty (L2 norm) to the OLS objective function, shrinking coefficients toward zero without eliminating any. This stabilizes estimates when predictors are correlated, trading a small amount of bias for a large reduction in variance.\n\nObjective Function:\n\nOLS minimizes: sum of (y_i - X_i x beta)^2\n\nRidge minimizes: sum of (y_i - X_i x beta)^2 + lambda x sum of beta_j^2\n\nThe closed-form solution is: beta_ridge = (X'X + lambda x I)^{-1} X'y\n\nAdding lambda x I to X'X makes the matrix invertible even when predictors are perfectly collinear.\n\nWorked Example:\n\nAnalyst Haruki at Pinebrook Capital regresses quarterly earnings surprises on three predictors: analyst sentiment (X1), momentum (X2), and a sentiment-momentum composite (X3). Because X3 is nearly a linear combination of X1 and X2, OLS produces:\n\n| | OLS Coefficients | Std Error |\n|---|---|---|\n| X1 | 12.4 | 8.7 |\n| X2 | -9.8 | 7.2 |\n| X3 | 6.1 | 11.3 |\n\nThe coefficients are large in absolute value with enormous standard errors -- classic multicollinearity.\n\nApplying ridge with lambda = 2.5:\n\n| | Ridge Coefficients | Effective Std Error |\n|---|---|---|\n| X1 | 3.8 | 1.9 |\n| X2 | -2.1 | 1.6 |\n| X3 | 1.4 | 2.0 |\n\nAll coefficients shrink substantially, standard errors drop, and predictions become much more stable out-of-sample.\n\nChoosing Lambda:\n\nThe penalty parameter lambda controls the bias-variance tradeoff:\n- lambda = 0: pure OLS (no shrinkage, possibly high variance)\n- lambda approaches infinity: all coefficients shrink toward zero (high bias)\n- Optimal lambda: typically found via cross-validation, selecting the value that minimizes out-of-sample prediction error\n\nGeometric Interpretation:\n\nThe L2 penalty constrains coefficients to lie within a hypersphere (circle in 2D) centered at the origin. OLS finds the unconstrained minimum. Ridge finds the point on the constraint boundary closest to the OLS solution. Because the constraint region is round, ridge shrinks all coefficients proportionally but never sets any exactly to zero.\n\nKey Properties for CFA:\n- Ridge always includes all predictors (no variable selection)\n- Works best when all predictors contribute some information but are correlated\n- Requires standardizing predictors first so the penalty treats them equally\n- Does not produce sparse models (contrast with LASSO)\n\nPractice regularization techniques in our CFA Quantitative Methods question bank.

📊

Master Level II with our CFA Course

107 lessons · 200+ hours· Expert instruction

#ridge-regression#l2-penalty#multicollinearity#regularization#shrinkage