A
AcadiFi
FS
FeatureEng_Sofia2026-04-07
cfaLevel IIQuantitative Methods

What are the differences between filter, wrapper, and embedded feature selection methods for financial factor models?

I have 80 candidate features for a return prediction model and need to narrow them down. My CFA quant material mentions filter, wrapper, and embedded methods, but I'm unclear when each is appropriate. Correlation screening seems too simplistic, and forward stepwise selection takes forever. What's the recommended approach for a realistic financial dataset?

85 upvotes
Verified ExpertVerified Expert
AcadiFi Certified Professional

Feature selection removes irrelevant or redundant predictors to improve model generalization and interpretability. The three method categories differ in how they interact with the learning algorithm.\n\nMethod Classification:\n\n| Category | How It Works | Speed | Model Dependency |\n|---|---|---|---|\n| Filter | Ranks features by statistical metric independently of any model | Fastest | None |\n| Wrapper | Evaluates feature subsets by training and testing the actual model | Slowest | Full |\n| Embedded | Feature selection occurs during model training | Moderate | Built-in |\n\nFilter Methods:\nCompute a relevance score for each feature independently:\n- Correlation with target: keeps features with |rho| > threshold\n- Mutual information: captures nonlinear relationships\n- Variance threshold: removes near-constant features\n\nFlintstone Analytics screens 80 factors for their equity model. Correlation filter with |rho| > 0.05 retains 34 factors. Mutual information adds 6 more that have nonlinear predictive power (sentiment extremes, option skew).\n\nWrapper Methods:\nTreat the model as a black box and evaluate subsets:\n- Forward selection: start empty, add the best feature one at a time\n- Backward elimination: start with all features, remove the least useful\n- Recursive feature elimination (RFE): repeatedly trains the model and removes the weakest feature\n\nWrap a random forest in RFE. Start with 40 features, retrain and remove the least important. After 25 iterations, the optimal subset is 15 features with a cross-validated Sharpe of 1.31 versus 0.94 with all 40.\n\nEmbedded Methods:\n- LASSO regression: L1 penalty drives coefficients to exactly zero\n- Tree-based importance: random forest or gradient boosting ranks features by impurity reduction\n- Elastic Net: combines L1 and L2 for grouped selection\n\nPractical Recommendation for 80 Features:\n1. Filter first: remove features with near-zero variance and very low correlation (reduces to ~50)\n2. Apply embedded selection via LASSO or random forest importance (reduces to ~15-20)\n3. Optionally refine with wrapper-based RFE if computational budget allows\n\nThis cascaded approach balances speed with thoroughness and avoids the computational explosion of running wrappers on all 80 features.\n\nPractice feature selection questions in our CFA Quantitative Methods question bank.

📊

Master Level II with our CFA Course

107 lessons · 200+ hours· Expert instruction

#feature-selection#filter-methods#wrapper-methods#embedded-methods#lasso#factor-model