A Hybrid Data-Driven Machine Learning Approach for Anomaly Detection
DOI:
https://doi.org/10.54097/vbe18a52Keywords:
Random Forest, SHAP, Statistical Modeling, Anomaly Detection, NIPTAbstract
Anomaly detection is a vital application field of machine learning. Given the widespread problems of extreme class imbalance and complex nonlinearity in high-dimensional data, traditional anomaly detection methods can hardly achieve both high-precision identification and model interpretability. Taking non-invasive prenatal testing data as the research background, this paper proposes a two-stage expert model and constructs an anomaly detection strategy integrating statistical analysis and machine learning. After appropriate data oversampling, the random forest algorithm is used to preliminarily verify the reliability of the results, and the recall rate of each expert model on the test set exceeds 0.9. SHAP is further introduced to analyze the main effects and interaction effects, indicating that the Z-score of each chromosome plays a core role in its corresponding model. Combined with index quality control in NIPT theory, this paper innovatively applies SHAP single-feature dependence analysis to explore the stability of quality control indicators, constructs a standard quality control score, and divides samples into high, medium, and low risk regions, with an abnormality rate of only 7.6% in the low-risk region. On this basis, the results of multi-index collaborative quality control are analyzed and predicted.
Downloads
References
[1] Nassif, A. B., Talib, M. A., Nasir, Q., & Albadarneh, M. (2021). Machine learning for anomaly detection: A systematic review. IEEE Access, 9, 78658-78700.
https://doi.org/10.1109/ACCESS.2021.3083060 DOI: https://doi.org/10.1109/ACCESS.2021.3083060
[2] Kang, M. (2018). Machine learning: Anomaly detection. In Prognostics and health management of electronics: Fundamentals, machine learning, and the internet of things (pp. 131-162). Wiley. DOI: https://doi.org/10.1002/9781119515326.ch6
[3] Benn, P., Cuckle, H., & Pergament, E. (2013). Non‐invasive prenatal testing for aneuploidy: Current status and future prospects. Ultrasound in Obstetrics & Gynecology, 42(1), 15-33. https://doi.org/10.1002/uog.12513 DOI: https://doi.org/10.1002/uog.12513
[4] Huang, Q., Xu, Q., Chen, M., Zou, Y., Li, S., & Liang, B. (2025). Application of non-invasive prenatal testing for fetal chromosomal disorders in low-risk pregnancies: A follow-up study in central China. Frontiers in Genetics, 16, 1574775. https://doi.org/10.3389/fgene.2025.1574775 DOI: https://doi.org/10.3389/fgene.2025.1574775
[5] Petropoulos, A., Siakoulis, V., & Stavroulakis, E. (2022). Towards an early warning system for sovereign defaults leveraging on machine learning methodologies. Intelligent Systems in Accounting, Finance and Management, 29(2), 118-129. https://doi.org/10.1002/isaf.1516 DOI: https://doi.org/10.1002/isaf.1516
[6] Khan, M. M., & Alkhathami, M. (2024). Anomaly detection in IoT-based healthcare: Machine learning for enhanced security. Scientific Reports, 14(1), 5872.
https://doi.org/10.1038/s41598-024-56126-x DOI: https://doi.org/10.1038/s41598-024-56126-x
[7] Rezvani, S., & Wang, X. (2023). A broad review on class imbalance learning techniques. Applied Soft Computing, 143, 110415. https://doi.org/10.1016/j.asoc.2023.110415 DOI: https://doi.org/10.1016/j.asoc.2023.110415
[8] Salman, H. A., Kalakech, A., & Steiti, A. (2024). Random forest algorithm overview. Babylonian Journal of Machine Learning, 2024, 69-79.
https://doi.org/10.58496/BJML/2024/007 DOI: https://doi.org/10.58496/BJML/2024/007
[9] Lundberg, S. M., & Lee, S. I. (2017). Consistent feature attribution for tree ensembles. arXiv, arXiv:1706.06060. https://doi.org/10.48550/arXiv.1706.06060
[10] Bernal, L., Rastelli, G., & Pinzi, L. (2025). Improving machine learning classification predictions through SHAP and features analysis interpretation. Journal of Chemical Information and Modeling, 65(21), 11716-11732. DOI: https://doi.org/10.1021/acs.jcim.5c02015
https://doi.org/10.1021/acs.jcim.5c00894 DOI: https://doi.org/10.1021/acs.jcim.5c00894
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Journal of Computing and Electronic Information Management

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.








