How do random forests improve on single decision trees?
I understand decision trees, but CFA Level II also covers random forests. How do they fix the overfitting problem of individual trees, and what's the trade-off?
Random forests are an ensemble method that combines many decision trees to produce a more robust and accurate prediction. The core idea is that aggregating many weak, imperfect models produces a strong model.
How it works:
- Bootstrap sampling (bagging): Create B different training sets by sampling with replacement from the original data
- Build a tree on each sample: But at each split, only consider a random subset of features (not all features). This is the key innovation.
- Aggregate predictions:
- Classification: majority vote across all trees
- Regression: average of all tree predictions
Why random feature selection matters:
If one feature is very strong (e.g., credit score for default prediction), every single bagged tree will split on that feature first, making all trees correlated. Random feature selection forces trees to use different variables, decorrelating them. Averaging decorrelated trees reduces variance much more than averaging correlated ones.
Single tree vs. random forest:
| Feature | Single Tree | Random Forest |
|---|---|---|
| Interpretability | High (clear decision path) | Low (hundreds of trees) |
| Variance (overfitting) | High | Low |
| Bias | Low (can fit complex patterns) | Slightly higher |
| Accuracy | Lower | Higher |
| Speed | Fast | Slower |
| Stability | Unstable (small data changes → different tree) | Stable |
Financial example:
Atlas Credit builds a random forest with 500 trees to predict corporate bond defaults. Each tree is trained on a bootstrap sample of 10,000 historical bonds, and at each split, only 4 of 15 available ratios are considered.
For a new bond, 340 trees predict "no default" and 160 predict "default." The random forest prediction: no default (68% confidence).
Hyperparameters to tune:
- Number of trees (B): More trees → lower variance, but diminishing returns after ~500
- Number of features per split (m): Commonly √p for classification, p/3 for regression (where p = total features)
- Maximum depth: Controls individual tree complexity
Feature importance: One benefit of random forests is measuring which features matter most — based on how much prediction accuracy decreases when a feature is randomly shuffled.
Trade-off: You gain accuracy and stability but lose interpretability. A single decision tree can explain exactly why a loan was rejected; a random forest gives a probability but the reasoning is opaque.
Exam tip: CFA Level II emphasizes the bias-variance trade-off. Random forests reduce variance (overfitting) at the cost of some interpretability. Know the concepts of bagging and random feature selection.
Practice ML concepts in our CFA Level II question bank.
Master Level II with our CFA Course
107 lessons · 200+ hours· Expert instruction
Related Questions
What exactly is the Capital Market Expectations (CME) framework and why does it matter for asset allocation?
How do business cycle phases affect asset class return expectations?
Can someone explain the Grinold–Kroner model step by step with numbers?
How do you forecast fixed-income returns using the building-blocks approach?
PPP vs Interest Rate Parity for forecasting exchange rates — when do I use which?
Join the Discussion
Ask questions and get expert answers.