How does K-fold cross-validation prevent overfitting, and how do you choose the right K?
I'm reviewing model selection in CFA quant methods. The curriculum says K-fold cross-validation splits data into K subsets and rotates which one is the test set. But won't a larger K always be better since we use more training data? And how does this relate to the bias-variance tradeoff in model evaluation?
K-fold cross-validation divides the dataset into K equally sized folds, trains the model on K-1 folds, and validates on the remaining fold. This rotation repeats K times so every observation serves as both training and test data. The average validation error across all K folds provides a robust estimate of out-of-sample performance.\n\n`mermaid\ngraph TD\n A[\"Full Dataset
N observations\"] --> B[\"Split into K=5 Folds\"]\n B --> C[\"Iteration 1: Train on Folds 2-5
Test on Fold 1\"]\n B --> D[\"Iteration 2: Train on Folds 1,3-5
Test on Fold 2\"]\n B --> E[\"Iteration 3: Train on Folds 1-2,4-5
Test on Fold 3\"]\n B --> F[\"Iteration 4: Train on Folds 1-3,5
Test on Fold 4\"]\n B --> G[\"Iteration 5: Train on Folds 1-4
Test on Fold 5\"]\n C --> H[\"Error₁\"]\n D --> I[\"Error₂\"]\n E --> J[\"Error₃\"]\n F --> K[\"Error₄\"]\n G --> L[\"Error₅\"]\n H --> M[\"CV Error = Average(Error₁...Error₅)\"]\n I --> M\n J --> M\n K --> M\n L --> M\n`\n\nWorked Example:\n\nResearcher Adeline at Thornbury Analytics builds two models to predict corporate bond spreads using 200 observations. Model A uses 3 predictors (leverage, interest coverage, rating). Model B uses 12 predictors including interaction terms.\n\nUsing 5-fold cross-validation (each fold has 40 observations):\n\n| Fold | Model A RMSE | Model B RMSE |\n|---|---|---|\n| 1 | 42 bps | 38 bps |\n| 2 | 45 bps | 61 bps |\n| 3 | 39 bps | 55 bps |\n| 4 | 44 bps | 47 bps |\n| 5 | 41 bps | 58 bps |\n| Average | 42.2 bps | 51.8 bps |\n\nModel A has lower and more stable CV error despite worse in-sample fit. Model B overfits, performing well on some folds but poorly on others. Adeline selects Model A.\n\nChoosing K:\n\n- K = 5 or K = 10 are standard choices that balance bias and variance\n- Small K (e.g., 2): high bias because training sets are small; low variance across folds\n- Large K (e.g., n, which is leave-one-out): low bias because training sets are nearly full-sized; high variance because folds overlap heavily\n- K = 10 is the most common in practice and generally recommended on the CFA exam\n\nKey Distinction from Train-Test Split:\nA single train-test split wastes data and produces a noisy estimate. K-fold uses all data for both training and validation, giving a more reliable performance metric.\n\nExplore model selection techniques in our CFA Quantitative Methods course.
Master Level II with our CFA Course
107 lessons · 200+ hours· Expert instruction
Related Questions
What are the most reliable candlestick reversal patterns, and how should CFA candidates interpret them in context?
What are the CFA Standards requirements for research reports, and what must be disclosed versus recommended?
How does IAS 41 require biological assets to be measured, and what happens when fair value cannot be reliably determined?
Under IFRIC 12, how should a company account for a service concession arrangement, and what determines whether the intangible or financial asset model applies?
What is the investment entities exception under IFRS 10, and why are some parents exempt from consolidating their subsidiaries?
Join the Discussion
Ask questions and get expert answers.