How does K-fold cross-validation prevent overfitting, and how do you choose the right K?

Question

AcadiFi · Accepted Answer

K-fold cross-validation divides the dataset into K equally sized folds, trains the model on K-1 folds, and validates on the remaining fold. This rotation repeats K times so every observation serves as both training and test data. The average validation error across all K folds provides a robust estimate of out-of-sample performance. ```mermaid graph TD A["Full Dataset
N observations"] --> B["Split into K=5 Folds"] B --> C["Iteration 1: Train on Folds 2-5
Test on Fold 1"] B --> D["Iteration 2: Train on Folds 1,3-5
Test on Fold 2"] B --> E["Iteration 3: Train on Folds 1-2,4-5
Test on Fold 3"] B --> F["Iteration 4: Train on Folds 1-3,5
Test on Fold 4"] B --> G["Iteration 5: Train on Folds 1-4
Test on Fold 5"] C --> H["Error₁"] D --> I["Error₂"] E --> J["Error₃"] F --> K["Error₄"] G --> L["Error₅"] H --> M["CV Error = Average(Error₁...Error₅)"] I --> M J --> M K --> M L --> M ``` **Worked Example:** Researcher Adeline at Thornbury Analytics builds two models to predict corporate bond spreads using 200 observations. Model A uses 3 predictors (leverage, interest coverage, rating). Model B uses 12 predictors including interaction terms. Using 5-fold cross-validation (each fold has 40 observations): | Fold | Model A RMSE | Model B RMSE | |---|---|---| | 1 | 42 bps | 38 bps | | 2 | 45 bps | 61 bps | | 3 | 39 bps | 55 bps | | 4 | 44 bps | 47 bps | | 5 | 41 bps | 58 bps | | **Average** | **42.2 bps** | **51.8 bps** | Model A has lower and more stable CV error despite worse in-sample fit. Model B overfits, performing well on some folds but poorly on others. Adeline selects Model A. **Choosing K:** - K = 5 or K = 10 are standard choices that balance bias and variance - Small K (e.g., 2): high bias because training sets are small; low variance across folds - Large K (e.g., n, which is leave-one-out): low bias because training sets are nearly full-sized; high variance because folds overlap heavily - K = 10 is the most common in practice and generally recommended on the CFA exam **Key Distinction from Train-Test Split:** A single train-test split wastes data and produces a noisy estimate. K-fold uses all data for both training and validation, giving a more reliable performance metric. Explore model selection techniques in our CFA Quantitative Methods course.

How does K-fold cross-validation prevent overfitting, and how do you choose the right K?

Master Level II with our CFA Course

Related Questions

Practice Questions