A Hybrid Data-Driven Machine Learning Approach for Anomaly Detection
DOI:
https://doi.org/10.54097/vbe18a52Keywords:
Random Forest, SHAP, Statistical Modeling, Anomaly Detection, NIPTAbstract
Anomaly detection is a vital application field of machine learning. Given the widespread problems of extreme class imbalance and complex nonlinearity in high-dimensional data, traditional anomaly detection methods can hardly achieve both high-precision identification and model interpretability. Taking non-invasive prenatal testing data as the research background, this paper proposes a two-stage expert model and constructs an anomaly detection strategy integrating statistical analysis and machine learning. After appropriate data oversampling, the random forest algorithm is used to preliminarily verify the reliability of the results, and the recall rate of each expert model on the test set exceeds 0.9. SHAP is further introduced to analyze the main effects and interaction effects, indicating that the Z-score of each chromosome plays a core role in its corresponding model. Combined with index quality control in NIPT theory, this paper innovatively applies SHAP single-feature dependence analysis to explore the stability of quality control indicators, constructs a standard quality control score, and divides samples into high, medium, and low risk regions, with an abnormality rate of only 7.6% in the low-risk region. On this basis, the results of multi-index collaborative quality control are analyzed and predicted.
Downloads
References
[1] Nassif, A. B., Talib, M. A., Nasir, Q., & Albadarneh, M. (2021). Machine learning for anomaly detection: A systematic review. IEEE Access, 9, 78658-78700.
https://doi.org/10.1109/ACCESS.2021.3083060
[2] Kang, M. (2018). Machine learning: Anomaly detection. In Prognostics and health management of electronics: Fundamentals, machine learning, and the internet of things (pp. 131-162). Wiley.
[3] Benn, P., Cuckle, H., & Pergament, E. (2013). Non‐invasive prenatal testing for aneuploidy: Current status and future prospects. Ultrasound in Obstetrics & Gynecology, 42(1), 15-33. https://doi.org/10.1002/uog.12513
[4] Huang, Q., Xu, Q., Chen, M., Zou, Y., Li, S., & Liang, B. (2025). Application of non-invasive prenatal testing for fetal chromosomal disorders in low-risk pregnancies: A follow-up study in central China. Frontiers in Genetics, 16, 1574775. https://doi.org/10.3389/fgene.2025.1574775
[5] Petropoulos, A., Siakoulis, V., & Stavroulakis, E. (2022). Towards an early warning system for sovereign defaults leveraging on machine learning methodologies. Intelligent Systems in Accounting, Finance and Management, 29(2), 118-129. https://doi.org/10.1002/isaf.1516
[6] Khan, M. M., & Alkhathami, M. (2024). Anomaly detection in IoT-based healthcare: Machine learning for enhanced security. Scientific Reports, 14(1), 5872.
https://doi.org/10.1038/s41598-024-56126-x
[7] Rezvani, S., & Wang, X. (2023). A broad review on class imbalance learning techniques. Applied Soft Computing, 143, 110415. https://doi.org/10.1016/j.asoc.2023.110415
[8] Salman, H. A., Kalakech, A., & Steiti, A. (2024). Random forest algorithm overview. Babylonian Journal of Machine Learning, 2024, 69-79.
https://doi.org/10.58496/BJML/2024/007
[9] Lundberg, S. M., & Lee, S. I. (2017). Consistent feature attribution for tree ensembles. arXiv, arXiv:1706.06060. https://doi.org/10.48550/arXiv.1706.06060
[10] Bernal, L., Rastelli, G., & Pinzi, L. (2025). Improving machine learning classification predictions through SHAP and features analysis interpretation. Journal of Chemical Information and Modeling, 65(21), 11716-11732.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Journal of Computing and Electronic Information Management

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.








