What does R-squared really tell you, and what are its limitations?
CFA Level I regression section. I know R² measures 'goodness of fit' and ranges from 0 to 1, but when is a high R² meaningful vs. misleading? Can a model with R² = 0.95 still be useless?
R² (coefficient of determination) is the most commonly reported regression statistic, but it's widely misunderstood. Let's get precise about what it does and doesn't tell you.
What R² Measures:
R² = 1 - (SSE / SST)
Where:
- SST (Total Sum of Squares) = Total variation in Y
- SSE (Sum of Squared Errors) = Unexplained variation
- SSR (Sum of Squares Regression) = Explained variation
- SST = SSR + SSE
R² = SSR / SST = Proportion of Y's variation explained by X
Example:
R² = 0.72 means 72% of the variation in the dependent variable is explained by the independent variable(s). The remaining 28% is unexplained.
When high R² is meaningful:
- Cross-sectional models with genuine economic relationships (e.g., company size explaining analyst coverage)
- The slope coefficient is statistically significant
- The model passes residual diagnostics
When high R² is misleading:
1. Spurious correlation:
Regressing US GDP on world population gives R² near 0.99 — both trend upward over time, but there's no causal link. Time-trending variables will always produce high R².
2. Overfitting:
Adding more variables to a regression always increases R² (even random noise variables). That's why we also check Adjusted R², which penalizes for additional variables:
Adj R² = 1 - [(1 - R²)(n - 1) / (n - k - 1)]
3. Non-linear relationships:
If Y and X have a U-shaped relationship, a linear regression may have low R² even though X strongly predicts Y.
For simple regression (one X variable):
R² = r² (the square of the correlation coefficient)
If r = 0.85, then R² = 0.7225 = 72.25%
If r = -0.90, then R² = 0.81 = 81% (R² is always positive)
Practical interpretation guide:
| R² Value | Context | Interpretation |
|---|---|---|
| 0.95+ | Time series macro | Possibly spurious (check for trends) |
| 0.70-0.90 | Stock factor model | Strong explanatory power |
| 0.30-0.50 | Cross-sectional stock returns | Good for noisy financial data |
| 0.05-0.15 | Daily return prediction | Typical — returns are hard to predict |
Exam tip: Don't evaluate a model on R² alone. The CFA exam may present a model with high R² but insignificant coefficients, residual patterns, or obvious spurious correlation — you need to recognize these red flags.
Practice regression interpretation in our CFA Level I question bank.
Master Level I with our CFA Course
107 lessons · 200+ hours· Expert instruction
Related Questions
What exactly is the Capital Market Expectations (CME) framework and why does it matter for asset allocation?
How do business cycle phases affect asset class return expectations?
Can someone explain the Grinold–Kroner model step by step with numbers?
How do you forecast fixed-income returns using the building-blocks approach?
PPP vs Interest Rate Parity for forecasting exchange rates — when do I use which?
Join the Discussion
Ask questions and get expert answers.