What are the differences between filter, wrapper, and embedded feature selection methods for financial factor models?

Question

AcadiFi · Accepted Answer

Feature selection removes irrelevant or redundant predictors to improve model generalization and interpretability. The three method categories differ in how they interact with the learning algorithm.

**Method Classification:**

| Category | How It Works | Speed | Model Dependency |
|---|---|---|---|
| Filter | Ranks features by statistical metric independently of any model | Fastest | None |
| Wrapper | Evaluates feature subsets by training and testing the actual model | Slowest | Full |
| Embedded | Feature selection occurs during model training | Moderate | Built-in |

**Filter Methods:**
Compute a relevance score for each feature independently:
- Correlation with target: keeps features with |rho| > threshold
- Mutual information: captures nonlinear relationships
- Variance threshold: removes near-constant features

Flintstone Analytics screens 80 factors for their equity model. Correlation filter with |rho| > 0.05 retains 34 factors. Mutual information adds 6 more that have nonlinear predictive power (sentiment extremes, option skew).

**Wrapper Methods:**
Treat the model as a black box and evaluate subsets:
- Forward selection: start empty, add the best feature one at a time
- Backward elimination: start with all features, remove the least useful
- Recursive feature elimination (RFE): repeatedly trains the model and removes the weakest feature

Wrap a random forest in RFE. Start with 40 features, retrain and remove the least important. After 25 iterations, the optimal subset is 15 features with a cross-validated Sharpe of 1.31 versus 0.94 with all 40.

**Embedded Methods:**
- LASSO regression: L1 penalty drives coefficients to exactly zero
- Tree-based importance: random forest or gradient boosting ranks features by impurity reduction
- Elastic Net: combines L1 and L2 for grouped selection

**Practical Recommendation for 80 Features:**
1. Filter first: remove features with near-zero variance and very low correlation (reduces to ~50)
2. Apply embedded selection via LASSO or random forest importance (reduces to ~15-20)
3. Optionally refine with wrapper-based RFE if computational budget allows

This cascaded approach balances speed with thoroughness and avoids the computational explosion of running wrappers on all 80 features.

Practice feature selection questions in our CFA Quantitative Methods question bank.

What are the differences between filter, wrapper, and embedded feature selection methods for financial factor models?

Master Level II with our CFA Course

Related Questions

Practice Questions