A Survey of Clustering Federated Learning in Heterogeneous Data Scenarios
DOI:
https://doi.org/10.54097/v7wcad61Keywords:
Clustering Federated Learning, Data Heterogeneity, Federated Learning, Non-Independent and Identically Distributed (Non-IID)Abstract
Federated learning, as a collaborative training paradigm that preserves raw data privacy, offers an effective solution for data protection concerns. However, its practical implementation faces significant challenges due to data heterogeneity. This heterogeneity manifests as non-independent and identically distributed (non-IID) data across participating entities, resulting in degraded model performance, slower convergence rates, and training instability. While conventional federated learning approaches—including parameter averaging, knowledge distillation, and personalization techniques—offer certain advantages, their efficacy remains limited in severely heterogeneous environments. This survey systematically examines research advancements in clustered federated learning for addressing data heterogeneity challenges, encompassing fundamental principles, model architecture development, and algorithmic implementations. We provide a detailed analysis of innovative algorithms ranging from IFCA to FedGroup, and from FCL-GNN to FedAC, highlighting their technical contributions and applicable scenarios. Furthermore, we explore emerging research directions including clustering interpretability, multi-source heterogeneous information fusion, dynamic clustering mechanisms, and resource-aware optimization. Clustered federated learning effectively enhances model performance and convergence efficiency while maintaining privacy by grouping participants with similar data distributions into clusters and training specialized models for each cluster. With ongoing technological progress, clustered federated learning shows promise for achieving an optimal balance between privacy preservation and learning efficiency in critical domains such as healthcare and finance, thereby contributing to the sustainable development of artificial intelligence technologies.
Downloads
References
[1] Wang H, Wang Q, Ding Y, et al. Privacy-preserving federated learning based on partial low-quality data[J]. Journal of Cloud Computing, 2024, 13(1): 62.
[2] Babar M, Qureshi B, Koubaa A. Investigating the impact of data heterogeneity on the performance of federated learning algorithm using medical imaging[J]. Plos one, 2024, 19(5): e0302539.
[3] Ye M, Fang X, Du B, et al. Heterogeneous federated learning: State-of-the-art and research challenges[J]. ACM Computing Surveys, 2023, 56(3): 1-44.
[4] Liu R, Yu S, Lan L, et al. A Remedy for Heterogeneous Data: Clustered Federated Learning with Gradient Trajectory[J]. Big Data Mining and Analytics, 2024, 7(4): 1050-1064.
[5] Li Z, Yuan S, Guan Z. Robust and scalable federated learning framework for client data heterogeneity based on optimal clustering[J]. Journal of Parallel and Distributed Computing, 2025, 195: 104990.
[6] Zhao X, Xie P, Xing L, et al. Clustered federated learning based on momentum gradient descent for heterogeneous data[J]. Electronics, 2023, 12(9): 1972.
[7] Gosselin R, Vieu L, Loukil F, et al. Privacy and security in federated learning: A survey[J]. Applied Sciences, 2022, 12(19): 9901..
[8] McMahan B, Moore E, Ramage D, et al. Communication-efficient learning of deep networks from decentralized data[C]//Artificial intelligence and statistics. PMLR, 2017: 1273-1282.
[9] Li T, Sahu A K, Zaheer M, et al. Federated optimization in heterogeneous networks[J]. Proceedings of Machine learning and systems, 2020, 2: 429-450.
[10] Lee S, Sahu A K, He C, et al. Partial model averaging in federated learning: Performance guarantees and benefits[J]. Neurocomputing, 2023, 556: 126647.
[11] Sun T, Li D, Wang B. Decentralized federated averaging[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(4): 4289-4301.
[12] Li D, Wang J. Fedmd: Heterogenous federated learning via model distillation[J]. arXiv preprint arXiv:1910.03581, 2019.
[13] Wu C, Wu F, Lyu L, et al. Communication-efficient federated learning via knowledge distillation[J]. Nature communications, 2022, 13(1): 2032.
[14] Shao J, Wu F, Zhang J. Selective knowledge sharing for privacy-preserving federated distillation without a good teacher[J]. Nature Communications, 2024, 15(1): 349.
[15] Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//International conference on machine learning. PMLR, 2017: 1126-1135.
[16] Vettoruzzo A, Bouguelia M R, Rögnvaldsson T. Personalized federated learning with contextual modulation and meta-learning[C]//Proceedings of the 2024 SIAM International Conference on Data Mining (SDM). Society for Industrial and Applied Mathematics, 2024: 842-850.
[17] Liu W, Xu X, Wu J, et al. Federated meta reinforcement learning for personalized tasks[J]. Tsinghua Science and Technology, 2023, 29(3): 911-926.
[18] Vahidian S, Morafah M, Chen C, et al. Rethinking data heterogeneity in federated learning: Introducing a new notion and standard benchmarks[J]. IEEE Transactions on Artificial Intelligence, 2023, 5(3): 1386-1397.
[19] Babar M, Qureshi B, Koubaa A. Investigating the impact of data heterogeneity on the performance of federated learning algorithm using medical imaging[J]. Plos one, 2024, 19(5): e0302539.
[20] Ghosh A, Chung J, Yin D, et al. An efficient framework for clustered federated learning[J]. Advances in neural information processing systems, 2020, 33: 19586-19597.
[21] Long G, Xie M, Shen T, et al. Multi-center federated learning: clients clustering for better personalization[J]. World Wide Web, 2023, 26(1): 481-500.
[22] Tian P, Liao W, Yu W, et al. WSCC: A weight-similarity-based client clustering approach for non-IID federated learning[J]. IEEE Internet of Things Journal, 2022, 9(20): 20243-20256.
[23] Wei X X, Huang H. Edge devices clustering for federated visual classification: A feature norm based framework[J]. IEEE Transactions on Image Processing, 2023, 32: 995-1010.
[24] Cai L, Chen N, Cao Y, et al. FedCE: Personalized federated learning method based on clustering ensembles[C]//Proceedings of the 31st ACM international conference on multimedia. 2023: 1625-1633.
[25] Duan M, Liu D, Ji X, et al. Fedgroup: Efficient federated learning via decomposed similarity-based clustering[C]//2021 IEEE Intl Conf on parallel & distributed processing with applications, big data & cloud computing, sustainable computing & communications, social computing & networking (ISPA/BDCloud/SocialCom/SustainCom). IEEE, 2021: 228-237.
[26] Duan M, Liu D, Ji X, et al. Flexible clustered federated learning for client-level data distribution shift[J]. IEEE Transactions on Parallel and Distributed Systems, 2021, 33(11): 2661-2674.
[27] Ghosh A, Mazumdar A. An improved algorithm for clustered federated learning[J]. arXiv preprint arXiv:2210.11538, 2022.
[28] Xie L, Hu Z, Cai X, et al. Explainable recommendation based on knowledge graph and multi-objective optimization[J]. Complex & Intelligent Systems, 2021, 7: 1241-1252.
[29] Qin Z, Yang L, Wang Q, et al. Reliable and interpretable personalized federated learning[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023: 20422-20431.
[30] Yang H, Li J, Hao M, et al. An efficient personalized federated learning approach in heterogeneous environments: a reinforcement learning perspective[J]. Scientific Reports, 2024, 14(1): 28877.
[31] Yuan L, Han D J, Wang S, et al. Communication-efficient multimodal federated learning: Joint modality and client selection[J]. arXiv preprint arXiv:2401.16685, 2024.
[32] Guo Y, Tang X, Lin T. Fedrc: Tackling diverse distribution shifts challenge in federated learning by robust clustering[J]. arXiv preprint arXiv:2301.12379, 2023.
[33] Zhang Y, Chen H, Lin Z, et al. FedAC: An Adaptive Clustered Federated Learning Framework for Heterogeneous Data[J]. arXiv preprint arXiv:2403.16460, 2024.
[34] Mughal F R, He J, Das B, et al. Adaptive federated learning for resource-constrained IoT devices through edge intelligence and multi-edge clustering[J]. Scientific Reports, 2024, 14(1): 28746.
[35] Wu C, Wu F, Lyu L, et al. Communication-efficient federated learning via knowledge distillation[J]. Nature communications, 2022, 13(1): 2032.
[36] Wang X, Wang H, Wu F, et al. Towards efficient heterogeneous multi-modal federated learning with hierarchical knowledge disentanglement[C]//Proceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems. 2024: 592-605.
[37] Kondaveeti H K, Sai G B, Athar S A, et al. Federated learning for smart agriculture: Challenges and opportunities[C]//2024 Third International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE). IEEE, 2024: 1-7.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Journal of Computing and Electronic Information Management

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.