Why is stepwise regression considered dangerous, and what are the main pitfalls of automated variable selection?
In my CFA quant review, the textbook strongly warns against stepwise regression. But it seems like a convenient way to find the best predictors automatically. If I'm running forward selection and adding variables one at a time based on p-values, what exactly goes wrong? Why do practitioners call it data mining?
Stepwise regression automates variable selection by iteratively adding or removing predictors based on statistical significance thresholds. While convenient, it introduces multiple serious problems that can produce misleading models.\n\nHow Stepwise Works:\n\n- Forward selection: Start with no variables, add the most significant one at each step\n- Backward elimination: Start with all variables, remove the least significant one at each step\n- Bidirectional: Combine both, adding and removing at each step\n\nThe Core Problems:\n\n`mermaid\ngraph TD\n A[\"Stepwise Regression\"] --> B[\"Multiple Testing Problem
Tests hundreds of combinations\"]\n A --> C[\"Inflated R-squared
Overfits training data\"]\n A --> D[\"Biased Coefficients
Selected vars have inflated estimates\"]\n A --> E[\"Unstable Selection
Small data changes flip variables\"]\n A --> F[\"Invalid p-values
Standard errors are too small\"]\n B --> G[\"False Discoveries\"]\n C --> G\n D --> G\n E --> G\n F --> G\n G --> H[\"Model fails out-of-sample\"] \n`\n\nWorked Example:\n\nResearcher Simone at Hartwell Economics has 120 monthly observations of stock returns and 40 candidate macro predictors. She runs forward stepwise selection.\n\nWith 40 candidates at alpha = 0.05, the probability of finding at least one spuriously significant variable by chance alone is:\n\nP(at least one false positive) = 1 - (1 - 0.05)^40 = 1 - 0.95^40 = 1 - 0.129 = 87.1%\n\nStepwise selects 6 variables with an in-sample R-squared of 0.34. But when Simone tests on 60 months of holdout data, R-squared drops to 0.04 -- almost no explanatory power. The model was fitting noise.\n\nSpecific Dangers:\n\n1. P-value distortion: After searching across many models, reported p-values no longer reflect true significance levels. A variable showing p = 0.02 after stepwise selection might have a corrected p-value above 0.10.\n\n2. Coefficient bias: Variables that survive selection are systematically those with larger sample estimates (by luck or noise). Their coefficients are biased upward in absolute value.\n\n3. Instability: Remove or add a few data points, and the selected model can change dramatically. This makes the approach unreliable for inference.\n\nBetter Alternatives:\n- Use information criteria (AIC, BIC) which penalize complexity\n- Apply regularization (ridge, LASSO) which shrinks coefficients\n- Use cross-validation to honestly evaluate out-of-sample performance\n- Let economic theory guide variable selection rather than the data alone\n\nExplore principled model-building techniques in our CFA Quantitative Methods course.
Master Level II with our CFA Course
107 lessons · 200+ hours· Expert instruction
Related Questions
What are the most reliable candlestick reversal patterns, and how should CFA candidates interpret them in context?
What are the CFA Standards requirements for research reports, and what must be disclosed versus recommended?
How does IAS 41 require biological assets to be measured, and what happens when fair value cannot be reliably determined?
Under IFRIC 12, how should a company account for a service concession arrangement, and what determines whether the intangible or financial asset model applies?
What is the investment entities exception under IFRS 10, and why are some parents exempt from consolidating their subsidiaries?
Join the Discussion
Ask questions and get expert answers.