Why does the LASSO's L1 penalty produce sparse models by setting some coefficients exactly to zero?
I learned that ridge regression shrinks coefficients but keeps them all nonzero. LASSO uses the absolute value penalty instead and can eliminate variables entirely. But why does switching from squared to absolute value create this sparsity property? I'm struggling with the geometric intuition behind this difference.
The LASSO (Least Absolute Shrinkage and Selection Operator) uses an L1 penalty -- the sum of absolute values of coefficients -- rather than the L2 squared penalty of ridge. This seemingly small change has a profound geometric consequence: the L1 constraint region has corners at the axes, and the OLS solution is more likely to touch these corners, setting some coefficients exactly to zero.\n\nObjective Function:\n\nLASSO minimizes: sum of (y_i - X_i x beta)^2 + lambda x sum of |beta_j|\n\nGeometric Insight:\n\n`mermaid\ngraph LR\n subgraph \"Ridge (L2)\"\n A[\"Circular constraint
No corners on axes
All coefficients nonzero\"]\n end\n subgraph \"LASSO (L1)\" \n B[\"Diamond constraint
Corners on axes
Some coefficients = 0\"]\n end\n A --> C[\"Shrinks but keeps all variables\"]\n B --> D[\"Shrinks AND selects variables\"]\n`\n\nImagine the OLS contours (ellipses of equal RSS) expanding outward from the unconstrained minimum. For ridge, they hit the circular boundary, which is smooth everywhere -- the contact point almost never falls exactly on an axis. For LASSO, the diamond has sharp corners on the axes, and the expanding ellipse frequently touches a corner first, zeroing out that dimension's coefficient.\n\nWorked Example:\n\nPortfolio manager Svetlana at Ashworth Investments uses 15 macroeconomic variables to predict equity risk premium. Most are noisy or redundant.\n\nOLS with 15 variables: R-squared = 0.28, adjusted R-squared = 0.11 (many insignificant coefficients).\n\nLASSO with lambda chosen by 10-fold cross-validation:\n\n| Variable | OLS Coefficient | LASSO Coefficient |\n|---|---|---|\n| Term Spread | 0.42 | 0.31 |\n| Credit Spread | 0.38 | 0.25 |\n| Earnings Yield | 0.29 | 0.18 |\n| Inflation Surprise | -0.15 | -0.08 |\n| Industrial Production | 0.22 | 0.00 |\n| Consumer Confidence | 0.08 | 0.00 |\n| Trade Balance | -0.04 | 0.00 |\n| ... (8 more) | various | 0.00 |\n\nLASSO keeps 4 variables and zeros out 11. The sparse model has cross-validated R-squared of 0.16 -- lower than OLS in-sample but dramatically better out-of-sample.\n\nLASSO vs. Ridge Summary:\n- LASSO produces interpretable sparse models (built-in variable selection)\n- Ridge is better when all variables contribute signal and are correlated\n- LASSO struggles when predictors are highly correlated (tends to pick one arbitrarily)\n- LASSO's solution path is piecewise linear, making it computationally efficient\n\nCFA Exam Tips:\n- LASSO = L1 = sparsity = variable selection\n- Ridge = L2 = shrinkage without selection\n- Both require standardized predictors and cross-validation for lambda\n\nMaster regularization concepts in our CFA Quantitative Methods course.
Master Level II with our CFA Course
107 lessons · 200+ hours· Expert instruction
Related Questions
What are the most reliable candlestick reversal patterns, and how should CFA candidates interpret them in context?
What are the CFA Standards requirements for research reports, and what must be disclosed versus recommended?
How does IAS 41 require biological assets to be measured, and what happens when fair value cannot be reliably determined?
Under IFRIC 12, how should a company account for a service concession arrangement, and what determines whether the intangible or financial asset model applies?
What is the investment entities exception under IFRS 10, and why are some parents exempt from consolidating their subsidiaries?
Join the Discussion
Ask questions and get expert answers.