A
AcadiFi
CN
ClusterQuant_Nils2026-04-04
cfaLevel IIQuantitative Methods

How is K-means clustering used to group assets for portfolio construction, and what are its limitations with financial return data?

I'm exploring unsupervised learning methods in CFA quantitative methods and want to understand how K-means can replace traditional sector classifications for portfolio construction. The idea of letting return patterns define asset groups sounds appealing, but I'm worried about K-means assumptions (spherical clusters, equal variance) clashing with the reality of financial data.

88 upvotes
AcadiFi TeamVerified Expert
AcadiFi Certified Professional

K-means clustering partitions assets into K groups by minimizing within-cluster variance. Instead of relying on subjective sector labels (which mix unrelated companies), clustering uses statistical return patterns to reveal natural groupings.\n\nAlgorithm Steps:\n1. Choose K (number of clusters)\n2. Initialize K centroids randomly\n3. Assign each asset to the nearest centroid (Euclidean distance in feature space)\n4. Recompute centroids as the mean of assigned assets\n5. Repeat steps 3-4 until assignments stabilize\n\nWorked Example:\nLakefront Asset Management wants to diversify a 50-stock portfolio beyond traditional GICS sectors. They compute 8 features for each stock: 12-month return, 60-day volatility, beta, dividend yield, P/E ratio, debt-to-equity, revenue growth, and earnings stability.\n\nRunning K-means with K=6 produces:\n\n| Cluster | Profile | Stocks | Traditional Sectors Mixed |\n|---|---|---|---|\n| 1 | High-growth, high-vol | 8 | Tech + Biotech + Consumer Discretionary |\n| 2 | Stable dividend payers | 11 | Utilities + REITs + Consumer Staples |\n| 3 | Cyclical value | 9 | Industrials + Materials + Energy |\n| 4 | Defensive low-beta | 7 | Healthcare + Telecom + Utilities |\n| 5 | Leveraged growth | 8 | Financials + Tech + Real Estate |\n| 6 | Quality compounders | 7 | Tech + Healthcare + Consumer |\n\nClusters 2 and 4 both contain Utilities stocks — but cluster 2 groups them with REITs based on yield characteristics while cluster 4 groups others with Healthcare based on low-beta behavior. This captures economically meaningful distinctions that sector labels miss.\n\nChoosing K (Elbow Method):\nPlot within-cluster sum of squares (WCSS) against K. Lakefront tested K=3 to K=12:\n\n- K=3: WCSS=284 (too few, heterogeneous clusters)\n- K=6: WCSS=121 (elbow point, clear inflection)\n- K=10: WCSS=89 (marginal improvement, overly granular)\n\nK=6 provided the best balance between granularity and statistical stability.\n\nLimitations with Financial Data:\n- K-means assumes spherical clusters — financial return distributions are often elongated or asymmetric\n- Sensitive to outliers (extreme returns distort centroids)\n- Clusters may be unstable across time periods — quarterly reclustering often reassigns 20-30% of assets\n- Euclidean distance in high dimensions suffers from the curse of dimensionality\n\nAlternatives: Hierarchical clustering handles non-spherical shapes; DBSCAN automatically determines K and handles outliers.\n\nExplore clustering applications in our CFA Quantitative Methods question bank.

📊

Master Level II with our CFA Course

107 lessons · 200+ hours· Expert instruction

#k-means#clustering#asset-grouping#unsupervised-learning#portfolio-construction