How does partial least squares differ from PCR by incorporating the response variable into component extraction?
I just learned about PCR's limitation that principal components don't consider Y. The curriculum says PLS fixes this by finding components that maximize covariance between X and Y. Can someone explain the PLS algorithm step by step and show why this supervised approach tends to outperform PCR?
Partial least squares (PLS) constructs latent components that simultaneously capture variance in X and maximize covariance with Y. Unlike PCR, which blindly finds the directions of greatest spread in the predictors, PLS finds directions that are both informative about X and predictive of Y.\n\nPLS vs. PCR:\n\n- PCR maximizes: Var(X x w) where w is the component weight vector\n- PLS maximizes: Cov(Y, X x w)^2 = [Cor(Y, X x w)]^2 x Var(X x w) x Var(Y)\n\nPLS balances both high X-variance and high Y-correlation. A direction in X-space that explains modest X-variance but strongly predicts Y will be favored over a high-variance direction uncorrelated with Y.\n\nSimplified PLS Algorithm:\n\n1. Compute weights w_1 by regressing each column of X on Y, then normalizing the weight vector\n2. Extract first PLS component: T_1 = X x w_1\n3. Regress Y on T_1 and store the coefficient\n4. Deflate X by removing the projection onto T_1\n5. Repeat for additional components using deflated X\n6. Choose the number of components via cross-validation\n\nWorked Example:\n\nResearcher Callum at Maplethorn Analytics predicts corporate bond excess returns using 18 financial and macroeconomic variables. With only 60 monthly observations, p/n = 0.30 creates instability.\n\nComparing approaches (5-fold cross-validation RMSE in basis points):\n\n| Method | Components/Variables | CV RMSE |\n|---|---|---|\n| OLS (all 18) | 18 | 142 bps |\n| PCR (5 components) | 5 | 108 bps |\n| PCR (3 components) | 3 | 115 bps |\n| PLS (3 components) | 3 | 89 bps |\n| PLS (2 components) | 2 | 84 bps |\n\nPLS with just 2 components outperforms PCR with 5 because those 2 PLS components are specifically constructed to predict Y. The first PLS component captures the combination of credit spread, term slope, and equity volatility that drives bond returns, even though these individually explain less total X-variance than the market-wide factor captured by PC1.\n\nAdvantages of PLS:\n- Works well with many predictors relative to observations (high p/n ratio)\n- Produces fewer components needed for good prediction\n- Handles multicollinearity without discarding predictive signal\n- Components remain interpretable as weighted combinations of original variables\n\nLimitations:\n- Can overfit if too many components are retained (always use cross-validation)\n- Less mathematically elegant than PCR (no clean eigendecomposition)\n- Not as widely implemented in basic statistical software\n\nCFA Exam Comparison:\n- PCR: unsupervised dimensionality reduction, then regression\n- PLS: supervised dimensionality reduction, targets prediction\n- Both solve multicollinearity; PLS usually needs fewer components\n\nExplore advanced regression methods in our CFA Quantitative Methods course.
Master Level II with our CFA Course
107 lessons · 200+ hours· Expert instruction
Related Questions
What are the most reliable candlestick reversal patterns, and how should CFA candidates interpret them in context?
What are the CFA Standards requirements for research reports, and what must be disclosed versus recommended?
How does IAS 41 require biological assets to be measured, and what happens when fair value cannot be reliably determined?
Under IFRIC 12, how should a company account for a service concession arrangement, and what determines whether the intangible or financial asset model applies?
What is the investment entities exception under IFRS 10, and why are some parents exempt from consolidating their subsidiaries?
Join the Discussion
Ask questions and get expert answers.