A
AcadiFi
PR
PCR_Reducer_Noemi2026-04-05
cfaLevel IIQuantitative Methods

How does principal component regression reduce dimensionality, and what are its limitations for prediction?

I'm covering PCR in CFA quant methods. The idea is to extract principal components from the predictors and regress Y on those instead. But the components maximize variance in X, not correlation with Y. Doesn't that mean the most important components for prediction might be ignored?

91 upvotes
AcadiFi TeamVerified Expert
AcadiFi Certified Professional

Principal component regression (PCR) addresses multicollinearity and high dimensionality by replacing the original correlated predictors with a smaller set of uncorrelated principal components. However, you correctly identify its fundamental limitation: the components that capture the most variance in X are not necessarily the most relevant for predicting Y.\n\nPCR Algorithm:\n\n`mermaid\ngraph TD\n A[\"Original Predictors
X₁, X₂, ..., X_p (correlated)\"] --> B[\"PCA on X matrix\"]\n B --> C[\"PC₁ explains 45% of X variance\"]\n B --> D[\"PC₂ explains 25% of X variance\"]\n B --> E[\"PC₃ explains 15% of X variance\"]\n B --> F[\"PC₄...PC_p
remaining 15%\"]\n C --> G[\"Keep top m components\"]\n D --> G\n E --> G\n G --> H[\"Regress Y on PC₁, PC₂, PC₃\"]\n H --> I[\"PCR Model
Uncorrelated regressors\"] \n`\n\nWorked Example:\n\nAnalyst Noemi at Whitfield Capital predicts monthly hedge fund returns using 25 correlated risk factors. OLS with all 25 is unstable (condition number > 500).\n\nPCA on the 25 factors extracts components:\n\n| Component | Variance Explained | Cumulative | Correlation with Y |\n|---|---|---|---|\n| PC1 | 38.2% | 38.2% | 0.12 |\n| PC2 | 18.7% | 56.9% | 0.41 |\n| PC3 | 11.3% | 68.2% | 0.05 |\n| PC4 | 8.1% | 76.3% | 0.38 |\n| PC5 | 5.4% | 81.7% | 0.02 |\n\nUsing the standard rule (keep components explaining 80% of variance), Noemi selects PC1 through PC5. But PC1, PC3, and PC5 have almost no correlation with the target variable. Meanwhile, PC4 is highly predictive despite explaining only 8.1% of X-variance.\n\nPCR with 5 components: R-squared = 0.22\nUsing only PC2 and PC4: R-squared = 0.31 (better with fewer components)\n\nThe Fundamental Limitation:\n\nPCA is an unsupervised technique -- it knows nothing about Y. The components maximizing X-variance may capture market-wide movements that explain predictor covariance but have no relation to the specific response. This is why partial least squares (PLS) was developed as a supervised alternative.\n\nWhen PCR Works Well:\n- When the high-variance components of X happen to also predict Y\n- When the primary goal is stabilizing predictions rather than maximizing R-squared\n- When dealing with near-singular X'X matrices where OLS fails entirely\n\nWhen PCR Fails:\n- When predictive signal lives in low-variance components\n- When interpretability of individual predictor effects is needed (components are linear combinations)\n- When a supervised method like PLS would better target the response\n\nCompare PCR with PLS in our CFA Quantitative Methods course.

📊

Master Level II with our CFA Course

107 lessons · 200+ hours· Expert instruction

#pcr#principal-components#dimensionality-reduction#pca#multicollinearity