What are the differences between filter, wrapper, and embedded feature selection methods for financial factor models?
I have 80 candidate features for a return prediction model and need to narrow them down. My CFA quant material mentions filter, wrapper, and embedded methods, but I'm unclear when each is appropriate. Correlation screening seems too simplistic, and forward stepwise selection takes forever. What's the recommended approach for a realistic financial dataset?
Feature selection removes irrelevant or redundant predictors to improve model generalization and interpretability. The three method categories differ in how they interact with the learning algorithm.\n\nMethod Classification:\n\n| Category | How It Works | Speed | Model Dependency |\n|---|---|---|---|\n| Filter | Ranks features by statistical metric independently of any model | Fastest | None |\n| Wrapper | Evaluates feature subsets by training and testing the actual model | Slowest | Full |\n| Embedded | Feature selection occurs during model training | Moderate | Built-in |\n\nFilter Methods:\nCompute a relevance score for each feature independently:\n- Correlation with target: keeps features with |rho| > threshold\n- Mutual information: captures nonlinear relationships\n- Variance threshold: removes near-constant features\n\nFlintstone Analytics screens 80 factors for their equity model. Correlation filter with |rho| > 0.05 retains 34 factors. Mutual information adds 6 more that have nonlinear predictive power (sentiment extremes, option skew).\n\nWrapper Methods:\nTreat the model as a black box and evaluate subsets:\n- Forward selection: start empty, add the best feature one at a time\n- Backward elimination: start with all features, remove the least useful\n- Recursive feature elimination (RFE): repeatedly trains the model and removes the weakest feature\n\nWrap a random forest in RFE. Start with 40 features, retrain and remove the least important. After 25 iterations, the optimal subset is 15 features with a cross-validated Sharpe of 1.31 versus 0.94 with all 40.\n\nEmbedded Methods:\n- LASSO regression: L1 penalty drives coefficients to exactly zero\n- Tree-based importance: random forest or gradient boosting ranks features by impurity reduction\n- Elastic Net: combines L1 and L2 for grouped selection\n\nPractical Recommendation for 80 Features:\n1. Filter first: remove features with near-zero variance and very low correlation (reduces to ~50)\n2. Apply embedded selection via LASSO or random forest importance (reduces to ~15-20)\n3. Optionally refine with wrapper-based RFE if computational budget allows\n\nThis cascaded approach balances speed with thoroughness and avoids the computational explosion of running wrappers on all 80 features.\n\nPractice feature selection questions in our CFA Quantitative Methods question bank.
Master Level II with our CFA Course
107 lessons · 200+ hours· Expert instruction
Related Questions
What are the most reliable candlestick reversal patterns, and how should CFA candidates interpret them in context?
What are the CFA Standards requirements for research reports, and what must be disclosed versus recommended?
How does IAS 41 require biological assets to be measured, and what happens when fair value cannot be reliably determined?
Under IFRIC 12, how should a company account for a service concession arrangement, and what determines whether the intangible or financial asset model applies?
What is the investment entities exception under IFRS 10, and why are some parents exempt from consolidating their subsidiaries?
Join the Discussion
Ask questions and get expert answers.