What's the difference between supervised and unsupervised learning, and how are they used in finance?
CFA Level II now covers machine learning basics. I get that supervised learning uses labeled data and unsupervised doesn't, but I'm unclear on practical finance applications. When would a portfolio manager or analyst use each type?
Machine learning (ML) in finance is a growing topic on the CFA exam. The fundamental distinction is about whether you have a target variable (label) to predict.
Supervised Learning: You have input features (X) and a known output (Y). The algorithm learns the relationship X -> Y from historical data, then predicts Y for new data.
Finance Applications:
- Credit scoring: Predict default probability (Y = default/no default) from borrower characteristics (income, debt ratio, credit history)
- Stock return prediction: Predict next-month return from factors (value, momentum, quality)
- Fraud detection: Classify transactions as fraudulent or legitimate
- Earnings forecasting: Predict quarterly EPS from fundamental and market data
Common Algorithms:
- Linear/logistic regression (simplest, most interpretable)
- Decision trees and random forests
- Support vector machines
- Neural networks (most flexible, least interpretable)
Unsupervised Learning: You only have input features — no target variable. The algorithm finds hidden patterns, groupings, or structure in the data.
Finance Applications:
- Portfolio clustering: Group stocks by return behavior (not just sector) to build truly diversified portfolios
- Regime detection: Identify market regimes (bull/bear/sideways) from price and volatility patterns
- Anomaly detection: Flag unusual trading patterns without pre-defining what 'unusual' means
- Risk factor discovery: Find hidden factors driving asset returns beyond traditional Fama-French factors
Common Algorithms:
- K-means clustering
- Principal Component Analysis (PCA)
- Hierarchical clustering
Loading diagram...
Hybrid Approach — Semi-Supervised: In practice, financial data often has a small amount of labeled data and a large amount of unlabeled data. Semi-supervised methods use both: train on labeled examples and let the unlabeled data improve the model's understanding of the data distribution.
Key Tradeoffs for Analysts:
| Aspect | Supervised | Unsupervised |
|---|---|---|
| Data requirement | Labeled data (expensive) | Unlabeled data (abundant) |
| Evaluation | Clear metrics (accuracy, RMSE) | Subjective (are clusters meaningful?) |
| Interpretability | Varies by algorithm | Often harder to interpret |
| Overfitting risk | High (fitting noise in labels) | Lower (no labels to overfit) |
Practice ML classification questions in our CFA Level II question bank.
Master Level II with our CFA Course
107 lessons · 200+ hours· Expert instruction
Related Questions
Why does an early retirement provision lower risk tolerance but high turnover does not — both reduce liabilities, right?
Why does it matter if the pension fund is invested in stocks similar to the sponsor's business?
What is the rule about active vs retired lives and pension plan duration?
Why does the textbook recommend 100% equities for a young employee? That sounds extremely aggressive.
I run my own startup. My income is volatile and tied to my industry. Should I hold ZERO equities in my financial accounts?
Join the Discussion
Ask questions and get expert answers.