Why is stepwise regression considered dangerous, and what are the main pitfalls of automated variable selection?

Question

AcadiFi · Accepted Answer

Stepwise regression automates variable selection by iteratively adding or removing predictors based on statistical significance thresholds. While convenient, it introduces multiple serious problems that can produce misleading models. **How Stepwise Works:** - **Forward selection:** Start with no variables, add the most significant one at each step - **Backward elimination:** Start with all variables, remove the least significant one at each step - **Bidirectional:** Combine both, adding and removing at each step **The Core Problems:** ```mermaid graph TD A[\"Stepwise Regression\"] --> B[\"Multiple Testing Problem
Tests hundreds of combinations\"] A --> C[\"Inflated R-squared
Overfits training data\"] A --> D[\"Biased Coefficients
Selected vars have inflated estimates\"] A --> E[\"Unstable Selection
Small data changes flip variables\"] A --> F[\"Invalid p-values
Standard errors are too small\"] B --> G[\"False Discoveries\"] C --> G D --> G E --> G F --> G G --> H[\"Model fails out-of-sample\"] ``` **Worked Example:** Researcher Simone at Hartwell Economics has 120 monthly observations of stock returns and 40 candidate macro predictors. She runs forward stepwise selection. With 40 candidates at alpha = 0.05, the probability of finding at least one spuriously significant variable by chance alone is: P(at least one false positive) = 1 - (1 - 0.05)^40 = 1 - 0.95^40 = 1 - 0.129 = **87.1%** Stepwise selects 6 variables with an in-sample R-squared of 0.34. But when Simone tests on 60 months of holdout data, R-squared drops to 0.04 -- almost no explanatory power. The model was fitting noise. **Specific Dangers:** 1. **P-value distortion:** After searching across many models, reported p-values no longer reflect true significance levels. A variable showing p = 0.02 after stepwise selection might have a corrected p-value above 0.10. 2. **Coefficient bias:** Variables that survive selection are systematically those with larger sample estimates (by luck or noise). Their coefficients are biased upward in absolute value. 3. **Instability:** Remove or add a few data points, and the selected model can change dramatically. This makes the approach unreliable for inference. **Better Alternatives:** - Use information criteria (AIC, BIC) which penalize complexity - Apply regularization (ridge, LASSO) which shrinks coefficients - Use cross-validation to honestly evaluate out-of-sample performance - Let economic theory guide variable selection rather than the data alone Explore principled model-building techniques in our CFA Quantitative Methods course.

Why is stepwise regression considered dangerous, and what are the main pitfalls of automated variable selection?

Master Level II with our CFA Course

Related Questions

Practice Questions