How does ridge regression use an L2 penalty to handle multicollinearity, and how do you choose the penalty parameter?
I understand that OLS can produce unstable coefficients when predictors are highly correlated. Ridge regression supposedly fixes this by adding a penalty term. But what does the L2 penalty actually do to the coefficient estimates geometrically? And how does lambda control the tradeoff between bias and variance?
Ridge regression adds a squared-magnitude penalty (L2 norm) to the OLS objective function, shrinking coefficients toward zero without eliminating any. This stabilizes estimates when predictors are correlated, trading a small amount of bias for a large reduction in variance.\n\nObjective Function:\n\nOLS minimizes: sum of (y_i - X_i x beta)^2\n\nRidge minimizes: sum of (y_i - X_i x beta)^2 + lambda x sum of beta_j^2\n\nThe closed-form solution is: beta_ridge = (X'X + lambda x I)^{-1} X'y\n\nAdding lambda x I to X'X makes the matrix invertible even when predictors are perfectly collinear.\n\nWorked Example:\n\nAnalyst Haruki at Pinebrook Capital regresses quarterly earnings surprises on three predictors: analyst sentiment (X1), momentum (X2), and a sentiment-momentum composite (X3). Because X3 is nearly a linear combination of X1 and X2, OLS produces:\n\n| | OLS Coefficients | Std Error |\n|---|---|---|\n| X1 | 12.4 | 8.7 |\n| X2 | -9.8 | 7.2 |\n| X3 | 6.1 | 11.3 |\n\nThe coefficients are large in absolute value with enormous standard errors -- classic multicollinearity.\n\nApplying ridge with lambda = 2.5:\n\n| | Ridge Coefficients | Effective Std Error |\n|---|---|---|\n| X1 | 3.8 | 1.9 |\n| X2 | -2.1 | 1.6 |\n| X3 | 1.4 | 2.0 |\n\nAll coefficients shrink substantially, standard errors drop, and predictions become much more stable out-of-sample.\n\nChoosing Lambda:\n\nThe penalty parameter lambda controls the bias-variance tradeoff:\n- lambda = 0: pure OLS (no shrinkage, possibly high variance)\n- lambda approaches infinity: all coefficients shrink toward zero (high bias)\n- Optimal lambda: typically found via cross-validation, selecting the value that minimizes out-of-sample prediction error\n\nGeometric Interpretation:\n\nThe L2 penalty constrains coefficients to lie within a hypersphere (circle in 2D) centered at the origin. OLS finds the unconstrained minimum. Ridge finds the point on the constraint boundary closest to the OLS solution. Because the constraint region is round, ridge shrinks all coefficients proportionally but never sets any exactly to zero.\n\nKey Properties for CFA:\n- Ridge always includes all predictors (no variable selection)\n- Works best when all predictors contribute some information but are correlated\n- Requires standardizing predictors first so the penalty treats them equally\n- Does not produce sparse models (contrast with LASSO)\n\nPractice regularization techniques in our CFA Quantitative Methods question bank.
Master Level II with our CFA Course
107 lessons · 200+ hours· Expert instruction
Related Questions
What are the most reliable candlestick reversal patterns, and how should CFA candidates interpret them in context?
What are the CFA Standards requirements for research reports, and what must be disclosed versus recommended?
How does IAS 41 require biological assets to be measured, and what happens when fair value cannot be reliably determined?
Under IFRIC 12, how should a company account for a service concession arrangement, and what determines whether the intangible or financial asset model applies?
What is the investment entities exception under IFRS 10, and why are some parents exempt from consolidating their subsidiaries?
Join the Discussion
Ask questions and get expert answers.