YOLO-RSLW: Research on Optimization of Lightweight YOLO11 Object Detection Algorithm

Guanxun Cui; Wei Xiao

doi:10.54097/04a3zc75

Authors

Guanxun Cui
Wei Xiao

DOI:

https://doi.org/10.54097/04a3zc75

Keywords:

Object detection, YOLO11, Lightweight network, Feature fusion, Loss function optimization, DOTAv1.5 dataset

Abstract

To address the challenges of excessive parameters and high computational complexity in YOLO-series models under resource-constrained environments, this paper proposes an improved YOLO11 framework named YOLO-RSLW, which focuses on the synergistic optimization of lightweight design and accuracy enhancement. The proposed approach systematically reconstructs the model architecture by targeting two critical components: feature fusion and detection head design, thereby significantly reducing model complexity while improving detection performance. Specifically, a lightweight RepNCSPELAN4Lighter module is designed based on reparameterization techniques] to replace the original C3k2 modules in both the backbone and neck networks. The SPPF and C2PSA modules are removed, and an SPPELAN module is introduced to enhance feature representation capability. An efficient LiteShiftHead is proposed to optimize the detection pipeline through task-decoupled group-wise processing and lightweight convolutions. Furthermore, the WIoU loss function is adopted to replace the original loss computation, enhancing the accuracy and robustness of bounding box regression. Experimental results on the DOTAv1.5 aerial image dataset demonstrate that the improved model reduces the parameter count by 37.3% (down to 5.92M), while achieving a 1.1 percentage point improvement in both mAP50 and mAP50-95 (with mAP50 reaching 38.2%), realizing a synergistic optimization of accuracy and efficiency. Ablation studies validate the effectiveness of each proposed module, among which RepNCSPELAN4Lighter and LiteShiftHead contribute most significantly to model compression, while the WIoU loss further improves localization accuracy. This study provides a more efficient detection solution for embedded platforms and real-time vision applications.

Downloads

Download data is not yet available.

References

[1] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA, 2016: 779-788.

[2] Ding X, Zhang X, Ma N, et al. Repvgg: Making vgg-style convnets great again[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, 2021: 13733-13742.

[3] Ge Z, Liu S, Wang F, et al. Yolox: Exceeding yolo series in 2021[J]. arXiv preprint arXiv:2107.08430, 2021.

[4] Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.

[5] Tong Z, Chen Y, Xu Z, et al. Wise-IoU: Bounding box regression loss with dynamic focusing mechanism[J]. arXiv preprint arXiv:2301.10051, 2023.

[6] Zhang X, Zhou X, Lin M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA, 2018: 6848-6856.

[7] Tian Z, Shen C, Chen H, et al. Fcos: Fully convolutional one-stage object detection[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South), 2019: 9627-9636.

[8] Liu Z, Li J, Shen Z, et al. Learning efficient convolutional networks through network slimming[C]. Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy, 2017: 2736-2744.

[9] Yang B, Bender G, Le Q V, et al. Condconv: Conditionally parameterized convolutions for efficient inference[C]. Advances in Neural Information Processing Systems. Vancouver, Canada, 2019, 32.

[10] Zheng Z, Wang P, Liu W, et al. Distance-IoU loss: Faster and better learning for bounding box regression[C]. Proceedings of the AAAI Conference on Artificial Intelligence. New York, USA, 2020, 34(07): 12993-13000.

[11] Wang C Y, Yeh I H, Liao H Y M. YOLOv9: Learning what you want to learn using programmable gradient information[C]. European Conference on Computer Vision. Milan, Italy, 2024: 1-18.

[12] Lee J H, Kim D H, Park S J. Enhanced Swine Behavior Detection with YOLOs and a Mixed Efficient Layer Aggregation Network in Real Time[J]. Animals, 2024, 14(23): 3410.

[13] Doloriel C T C, Cajote R D. Improving the detection of small oriented objects in aerial images[C]. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops. Waikoloa, HI, USA, 2023: 176-185.

[14] Zhu P, Wen L, Du D, et al. VisDrone2019[DS]. 2024. https://doi.org/10.57702/auwyaezl

[15] Xie J. STK-YOLO[DS]. IEEE DataPort, 2025. https://doi.org/10.21227/1rh2-kc55

[16] Paszke A, Gross S, Massa F, et al. Pytorch: An imperative style, high-performance deep learning library[C]. Advances in Neural Information Processing Systems. Vancouver, Canada, 2019, 32.

[17] Bottou L. Large-scale machine learning with stochastic gradient descent[C]. Proceedings of COMPSTAT'2010. Paris, France, 2010: 177-186.