Accelerating Deep Learning Inference for Brain Tumor Segmentation: A Review of Architectures, Frameworks, and Clinical Translation

Junyi He

doi:10.54097/qqsv1d58

Authors

Junyi He

DOI:

https://doi.org/10.54097/qqsv1d58

Keywords:

Brain umor segmentation, Inference efficiency, TensorRT, TVM, PyTorch 2.0

Abstract

Brain tumor segmentation has transitioned from an accuracy-dominant research agenda to a constrained multi-objective optimization problem in which segmentation quality, latency, memory footprint, robustness to missing modalities, and deployment reproducibility must be optimized jointly rather than sequentially. Focusing on developments published from 2023 to 2026, this review analyzes inference acceleration from a system perspective that links architectural evolution, model-compression and compilation strategies, and hardware-aware deployment constraints within a single causal framework. Specifically, we synthesize evidence on the shift from CNN/nnU-Net baselines to Transformer hybrids and linear-time State Space Model families, evaluate the practical effects of mixed precision, quantization, pruning, distillation, and compiled runtimes, and examine how modality-missing robustness interacts with graph stability and therefore with real-world compiler efficiency. On the basis of this synthesis, we formulate a reproducible evaluation protocol for acceleration claims that reduces cross-paper comparability errors and makes reported latency evidence clinically interpretable. We conclude with a forward-looking engineering roadmap indicating how the field can move from benchmark-centric speed demonstrations to reliable real-time segmentation systems suitable for heterogeneous hospital infrastructure.

Downloads

Download data is not yet available.

References

[1] M. Martucci et al., Magnetic resonance imaging of primary adult brain tumors: State of the art and future perspectives, Biomedicines 11 (2) (2023) 364. doi: 10.3390 / biomedicines11020364.

[2] B. Jiang et al., Deep learning for brain tumor segmentation in multimodal MRI images: A review of methods and advances, Image and Vision Computing 156 (2025). doi:10.1016/j.imavis.2025.105463.

[3] H. Xue et al., Multi-modal tumor segmentation methods based on deep learning: a narrative review, Quantitative Imaging in Medicine and Surgery 14 (1) (2024) 1122-1140. doi:10.21037/qims-23-818.

[4] F. J. Dorfner et al., A review of deep learning for brain tumor analysis in MRI, npj Precision Oncology 9 (1) (2025). doi:10.1038/s41698-024-00789-2.

[5] N. Netshamutshedzi et al., A systematic review of hybrid machine learning models for brain tumour segmentation and detection, Frontiers in Artificial Intelligence 8 (2025). doi:10.3389/frai.2025.1615550.

[6] T. Chen et al., TVM: An automated end-to-end optimizing compiler for deep learning, OSDI 2018.

[7] O. Shafi et al., Demystifying TensorRT on NVIDIA edge devices, IISWC 2021. doi:10.1109/IISWC53511.2021.00030.

[8] J. Ansel et al., PyTorch 2: Faster machine learning through dynamic Python bytecode transformation and graph compilation, ASPLOS 2024. doi:10.1145/3620665.3640366.

[9] B. Kerfoot et al., The 2024 Brain Tumor Segmentation (BraTS) Challenge: Glioma segmentation on post-treatment MRI, arXiv:2405.18368 (2024).

[10] S. Markidis et al., NVIDIA Tensor Core programmability, performance and precision, IPDPSW 2018. doi:10.1109/IPDPSW.2018.00091.

[11] J. Choquette et al., NVIDIA A100 Tensor Core GPU: Performance and innovation, IEEE Micro 41 (2) (2021) 29-35. doi:10.1109/MM.2021.3061394.

[12] M. J. Page et al., The PRISMA 2020 statement: An updated guideline for reporting systematic reviews, BMJ 372 (2021) n71. doi:10.1136/bmj.n71.

[13] C. Guo et al., AMGFormer: Adaptive multi-granular transformer for brain tumor segmentation with missing modalities (2026). arXiv:2601.19349.

[14] Y. Ding et al., RFNet: Region-aware fusion network for incomplete multi-modal brain tumor segmentation, ICCV 2021. doi:10.1109/ICCV48922.2021.00394.

[15] W. Duan et al., MidFusNet: Mid-dense fusion network for multi-modal brain MRI segmentation, BrainLes 2023. doi:10.1007/978-3-031-33842-7_9.

[16] H. Zhang et al., Incomplete multi-modal brain tumor segmentation via learnable sorting state space, CVPR 2025.

[17] Y. Li et al., CCSD: Cross-modal compositional self-distillation for robust brain tumor segmentation with missing modalities, arXiv:2511.14599 (2025).

[18] B. H. Menze et al., The multimodal brain tumor image segmentation benchmark (BraTS), IEEE Transactions on Medical Imaging 34 (10) (2015) 1993-2024. doi:10.1109/TMI.2014.2377694.

[19] S. Bakas et al., Advancing TCGA glioma MRI collections with expert labels and radiomics, Scientific Data 4 (1) (2017). doi:10.1038/sdata.2017.117.

[20] O. Ronneberger et al., U-Net: Convolutional networks for biomedical image segmentation, MICCAI 2015. doi:10.1007/978-3-319-24574-4_28.

[21] O. Oktay et al., Attention U-Net: Learning where to look for the pancreas (2018). arXiv:1804.03999.

[22] F. Isensee et al., nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation, Nature Methods 18 (2) (2021) 203-211. doi:10.1038/s41592-020-01008-z.

[23] F. Isensee et al., nnU-Net revisited: A call for rigorous validation in 3D medical image segmentation, MICCAI 2024. doi:10.1007/978-3-031-72114-4_47.

[24] Z. Huang et al., STU-Net: Scalable and transferable medical image segmentation models empowered by large-scale supervised pre-training (2023). arXiv:2304.06716.

[25] L. Wei et al., Advances in neural network quantization: A comprehensive review, Applied Sciences 14 (17) (2024). doi:10.3390/app14177445.

[26] A. Dequino et al., Optimizing bfloat16 deployment of tiny transformers on ultra-low power edge SoCs, Journal of Low Power Electronics and Applications 15 (1) (2025). doi:10.3390/jlpea15010008.

[27] T. Liang et al., Pruning and quantization for deep neural network acceleration: A survey, Neurocomputing 461 (2021) 370-403. doi:10.1016/j.neucom.2021.07.045.

[28] M. Sahiner et al., MedPTQ: A practical pipeline for real post-training quantization in 3D medical image segmentation (2025).

[29] T. Gale et al., The state of sparsity in deep neural networks, arXiv:1902.09574 (2019).

[30] A. Vaswani et al., Attention is all you need, NeurIPS 2017.

[31] J. Chen et al., TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers, Medical Image Analysis 97 (2024) 103280. doi:10.1016/j.media.2024.103280.

[32] S. Roy et al., MedNeXt: Transformer-driven scaling of ConvNets for medical image segmentation, MICCAI 2023.

[33] X. Jiang et al., Vision transformer promotes cancer diagnosis: A comprehensive review, Expert Systems with Applications 252 (2024). doi:10.1016/j.eswa.2024.124113.

[34] J. Sun, MedFusion-TransNet: multi-modal fusion via transformer for enhanced medical image segmentation, Frontiers in Medicine 12 (2025). doi:10.3389/fmed.2025.1557449.

[35] T. Moreau et al., A hardware-software blueprint for flexible deep learning specialization, IEEE Micro 39 (5) (2019) 8-16. doi:10.1109/MM.2019.2928962.

[36] NVIDIA Corporation, TensorRT Developer Guide, NVIDIA Documentation (2026).

[37] A. Gu, T. Dao, Mamba: Linear-time sequence modeling with selective state spaces, COLM 2024 (preprint 2023). arXiv:2312.00752.

[38] T. Dao, A. Gu, Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality, ICML 2024. arXiv:2405.21060.

[39] J. Ruan et al., VM-UNet: Vision Mamba UNet for medical image segmentation, ACM Transactions on Multimedia Computing, Communications, and Applications (2025). doi:10.1145/3767748.

[40] R. Zhou et al., Cascade residual multiscale convolution and Mamba-structured U-Net for advanced brain tumor segmentation, Entropy 26 (5) (2024). doi:10.3390/e26050385.

[41] H. Zheng et al., Gated differential linear attention: A linear-time decoder for high-fidelity medical segmentation (2026). arXiv:2603.02727.

[42] N. P. Jouppi et al., A domain-specific architecture for deep neural networks, Communications of the ACM 61 (9) (2018) 50-59. doi:10.1145/3154484.

[43] P. Yang et al., Progressive distillation with optimal transport for federated incomplete multi-modal learning of brain tumor segmentation, IEEE Transactions on Medical Imaging, early access (2025).

[44] M. Havaei et al., HeMIS: Hetero-modal image segmentation, MICCAI 2016.

[45] A. Sathe et al., Optimizing and deploying transformer INT8 inference with ONNX Runtime-TensorRT on NVIDIA GPUs, Microsoft Open Source Blog (2022).

[46] B. Jacob et al., Quantization and training of neural networks for efficient integer-arithmetic-only inference, CVPR 2018.

[47] E. Frantar et al., GPTQ: Accurate post-training quantization for generative pre-trained transformers, ICLR 2023.

[48] S. Han et al., Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding, ICLR 2016.

[49] G. Hinton et al., Distilling the knowledge in a neural network, arXiv:1503.02531 (2015).

[50] S. Gou et al., Knowledge distillation: A survey, International Journal of Computer Vision 129 (2021) 1789-1819.

[51] Y. Weng et al., NAS-Unet: Neural architecture search for medical image segmentation, IEEE Access 7 (2019) 44247-44257. doi:10.1109/ACCESS.2019.2908991.

[52] H. Cai et al., Once-for-all: Train one network and specialize it for efficient deployment, ICLR 2020.

[53] H. Pham et al., Efficient neural architecture search via parameter sharing, ICML 2018.

[54] A. Paszke et al., PyTorch: An imperative style, high-performance deep learning library, NeurIPS 2019.

[55] M. Looks et al., Deep learning with dynamic computation graphs, ICLR 2017.

[56] PyTorch Team, torch.compile end-to-end tutorial and performance tuning guide, PyTorch Documentation (2024).

[57] NVIDIA Corporation, NVIDIA H100 Tensor Core GPU Architecture Whitepaper (2022).

[58] T. Rohlfing et al., The SRI24 multichannel atlas of normal adult human brain structure, Human Brain Mapping 31 (5) (2010) 798-819. doi:10.1002/hbm.20906.

[59] M. J. Cardoso et al., MONAI: An open-source framework for deep learning in healthcare, arXiv:2211.02701 (2022).

[60] F. Perez-Garcia et al., TorchIO: A Python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning, Computer Methods and Programs in Biomedicine 208 (2021) 106236. doi:10.1016/j.cmpb.2021.106236.