How does principal component regression reduce dimensionality, and what are its limitations for prediction?
I'm covering PCR in CFA quant methods. The idea is to extract principal components from the predictors and regress Y on those instead. But the components maximize variance in X, not correlation with Y. Doesn't that mean the most important components for prediction might be ignored?
Principal component regression (PCR) addresses multicollinearity and high dimensionality by replacing the original correlated predictors with a smaller set of uncorrelated principal components. However, you correctly identify its fundamental limitation: the components that capture the most variance in X are not necessarily the most relevant for predicting Y.\n\nPCR Algorithm:\n\n`mermaid\ngraph TD\n A[\"Original Predictors
X₁, X₂, ..., X_p (correlated)\"] --> B[\"PCA on X matrix\"]\n B --> C[\"PC₁ explains 45% of X variance\"]\n B --> D[\"PC₂ explains 25% of X variance\"]\n B --> E[\"PC₃ explains 15% of X variance\"]\n B --> F[\"PC₄...PC_p
remaining 15%\"]\n C --> G[\"Keep top m components\"]\n D --> G\n E --> G\n G --> H[\"Regress Y on PC₁, PC₂, PC₃\"]\n H --> I[\"PCR Model
Uncorrelated regressors\"] \n`\n\nWorked Example:\n\nAnalyst Noemi at Whitfield Capital predicts monthly hedge fund returns using 25 correlated risk factors. OLS with all 25 is unstable (condition number > 500).\n\nPCA on the 25 factors extracts components:\n\n| Component | Variance Explained | Cumulative | Correlation with Y |\n|---|---|---|---|\n| PC1 | 38.2% | 38.2% | 0.12 |\n| PC2 | 18.7% | 56.9% | 0.41 |\n| PC3 | 11.3% | 68.2% | 0.05 |\n| PC4 | 8.1% | 76.3% | 0.38 |\n| PC5 | 5.4% | 81.7% | 0.02 |\n\nUsing the standard rule (keep components explaining 80% of variance), Noemi selects PC1 through PC5. But PC1, PC3, and PC5 have almost no correlation with the target variable. Meanwhile, PC4 is highly predictive despite explaining only 8.1% of X-variance.\n\nPCR with 5 components: R-squared = 0.22\nUsing only PC2 and PC4: R-squared = 0.31 (better with fewer components)\n\nThe Fundamental Limitation:\n\nPCA is an unsupervised technique -- it knows nothing about Y. The components maximizing X-variance may capture market-wide movements that explain predictor covariance but have no relation to the specific response. This is why partial least squares (PLS) was developed as a supervised alternative.\n\nWhen PCR Works Well:\n- When the high-variance components of X happen to also predict Y\n- When the primary goal is stabilizing predictions rather than maximizing R-squared\n- When dealing with near-singular X'X matrices where OLS fails entirely\n\nWhen PCR Fails:\n- When predictive signal lives in low-variance components\n- When interpretability of individual predictor effects is needed (components are linear combinations)\n- When a supervised method like PLS would better target the response\n\nCompare PCR with PLS in our CFA Quantitative Methods course.
Master Level II with our CFA Course
107 lessons · 200+ hours· Expert instruction
Related Questions
What are the most reliable candlestick reversal patterns, and how should CFA candidates interpret them in context?
What are the CFA Standards requirements for research reports, and what must be disclosed versus recommended?
How does IAS 41 require biological assets to be measured, and what happens when fair value cannot be reliably determined?
Under IFRIC 12, how should a company account for a service concession arrangement, and what determines whether the intangible or financial asset model applies?
What is the investment entities exception under IFRS 10, and why are some parents exempt from consolidating their subsidiaries?
Join the Discussion
Ask questions and get expert answers.