A concept drift-oriented active learning method
DOI:
https://doi.org/10.54097/14pf6057Keywords:
Data stream, Active learning, Concept driftAbstract
The continuous growth and evolving internal distribution of streaming data were long challenged by the phenomenon of concept drift in classification tasks. In practical applications, labeling costs were prohibitively high, and active learning was frequently employed to mitigate the scarcity of labels. However, in drifting environments, a single sampling strategy was prone to induce selection bias and information redundancy. In this paper, a concept drift-oriented active learning method, termed CDAL, was proposed, which integrated drift detection with a clustering-based representative sampling mechanism. Upon detecting a distribution change, a candidate set was constructed from neighboring samples and structurally partitioned using online clustering. A global average uncertainty was then calculated as a dynamic reference, and the overall uncertainty level within each cluster was compared. Based on this comparison, either a centroid sample or a random sample was adaptively selected from the cluster for labeling. This design eliminated the need for preset fixed thresholds and balanced the information content of samples with their distribution coverage while controlling annotation costs. Experiments conducted on multiple real-world and synthetic datasets demonstrated that CDAL achieved superior cumulative accuracy and related evaluation metrics compared to baseline methods and was capable of rapidly restoring classification performance for new concepts after drift occurred, thereby validating the effectiveness of the proposed strategy.
Downloads
References
[1] GAMA J, GANGULY A, OMITAOMU O, et al. Knowledge discovery from data streams[J]. Intelligent Data Analysis, 2009, 13(3): 403-404.
[2] WEBB G L, HYDE R, CAO H, et al. Characterizing concept drift[J]. Data Mining & Knowledge Discovery, 2016, 30(4): 964-994.
[3] SUÁREZ-CETRULO A L, QUINTANA D, CERVANTES A. A survey on machine learning for recurring concept drifting data streams[J]. Expert Systems with Applications, 2023, 213: 118934.1-118934.17.
[4] YU H, LIU W, LU J. et al. Detecting group concept drift from multiple data streams[J]. Pattern Recognition, 2023, 134: Article No. 109113.
[5] SHAHRAKI, A, ABBASI, M, TAHERKORDI, A, et al. Active learning for network traffic classification: A Technical Study[J]. IEEE Transactions on Cognitive Communications and Networking, 2022, 8(1): 422-439.
[6] KARIMIAN M, BEIGY H. Concept drift handling: a domain adaptation perspective[J]. Expert Systems with Applications, 2023, 224: Article No. 119946.
[7] DOMINGOS P, HULTEN G. Mining high-speed data streams [C]// Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. New York: ACM, 2000: 71-80.
[8] LIU A, LU J, ZHANG G. Diverse instance-weighting ensemble based on region drift disagreement for concept drift adaptation[J]. IEEE transactions on neural networks and learning systems, 2020, 32(1): 293-307.
[9] DE ROSA R, CESA-BIANCHI N. Confidence decision trees via online and active learning for streaming data[J]. Journal of Artificial Intelligence Research, 2017, 60: 1031-1055.
[10] ZGRAJA J, GAMA J, AND WO´ZNIAK M. Active learning by clustering for drifted data stream classification [C]// Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases Workshops. Dublin: Springer, 2019: 80-90.
[11] DOMINIK K, NIKLAS K, SEBASTIAN H. Machine learning operations (MLOps): Overview, definition, and architecture[J]. IEEE Access, 2023, 11: 31866-31879.
[12] MAURRAS U T, YOUSRA C, ALIOU B, et al. Anomalies detection using isolation in concept-drifting data streams[J]. Computers, 2021, 10(1): 1-21.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Journal of Computing and Electronic Information Management

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.








