How do random forests improve on single decision trees?

Question

AcadiFi · Accepted Answer

Random forests are an **ensemble method** that combines many decision trees to produce a more robust and accurate prediction. The core idea is that aggregating many weak, imperfect models produces a strong model.

**How it works:**

1. **Bootstrap sampling (bagging):** Create B different training sets by sampling with replacement from the original data
2. **Build a tree on each sample:** But at each split, only consider a **random subset of features** (not all features). This is the key innovation.
3. **Aggregate predictions:**
   - Classification: majority vote across all trees
   - Regression: average of all tree predictions

**Why random feature selection matters:**

If one feature is very strong (e.g., credit score for default prediction), every single bagged tree will split on that feature first, making all trees correlated. Random feature selection forces trees to use different variables, **decorrelating** them. Averaging decorrelated trees reduces variance much more than averaging correlated ones.

**Single tree vs. random forest:**

| Feature | Single Tree | Random Forest |
|---|---|---|
| Interpretability | High (clear decision path) | Low (hundreds of trees) |
| Variance (overfitting) | High | Low |
| Bias | Low (can fit complex patterns) | Slightly higher |
| Accuracy | Lower | Higher |
| Speed | Fast | Slower |
| Stability | Unstable (small data changes → different tree) | Stable |

**Financial example:**

Atlas Credit builds a random forest with 500 trees to predict corporate bond defaults. Each tree is trained on a bootstrap sample of 10,000 historical bonds, and at each split, only 4 of 15 available ratios are considered.

For a new bond, 340 trees predict "no default" and 160 predict "default." The random forest prediction: **no default** (68% confidence).

**Hyperparameters to tune:**
- **Number of trees (B):** More trees → lower variance, but diminishing returns after ~500
- **Number of features per split (m):** Commonly √p for classification, p/3 for regression (where p = total features)
- **Maximum depth:** Controls individual tree complexity

**Feature importance:** One benefit of random forests is measuring which features matter most — based on how much prediction accuracy decreases when a feature is randomly shuffled.

**Trade-off:** You gain accuracy and stability but lose interpretability. A single decision tree can explain exactly why a loan was rejected; a random forest gives a probability but the reasoning is opaque.

**Exam tip:** CFA Level II emphasizes the bias-variance trade-off. Random forests reduce variance (overfitting) at the cost of some interpretability. Know the concepts of bagging and random feature selection.

Practice ML concepts in our CFA Level II question bank.

How do random forests improve on single decision trees?

Master Level II with our CFA Course

Related Questions

Practice Questions

Feature	Single Tree	Random Forest
Interpretability	High (clear decision path)	Low (hundreds of trees)
Variance (overfitting)	High	Low
Bias	Low (can fit complex patterns)	Slightly higher
Accuracy	Lower	Higher
Speed	Fast	Slower
Stability	Unstable (small data changes → different tree)	Stable