OverLoCK-GPH: A Bio-Inspired Object Detector with Graph-Prior Modulation and Hybrid Instance Refinement
DOI:
https://doi.org/10.54097/0cetjv49Keywords:
Object Detection, Mask R-CNN, OverLoCK, Prior-Guided Modulation, Graph Attention, Hybrid BBox HeadAbstract
MASK R-CNN is a visual model based on convolutional neural networks and applied to object detection. In the Mask R-CNN architecture, the Backbone typically employs ResNet. Through continuous convolution and downsampling, it extracts texture and semantic features of the image equally layer by layer, resulting in a large amount of background noise being mistaken for useful information, which interferes with the localization of the target. In addition, the Neck adopts a simple top-down additive fusion. This fusion is static and linear, and is limited by the local receptive field of the convolutional kernel, resulting in a lack of spatial relationships in FPN, incomplete object detection, and inaccurate localization. This paper proposes an enhanced detection framework named OverLoCK-GPH. Firstly, we utilize the Overview-Net of OverLoCK to generate a global context prior, and inject it into the features at each level through a novel prior-guided feature pyramid network, achieving dynamic weight modulation in space. Secondly, we introduce the Graph Attention Block at the high-level feature extraction stage, which captures long-range semantic dependencies by modeling pixels as graph nodes. Finally, we designed a Hybrid Instance Refinement Head for detection, which suppresses background noise at the ROI level through a channel attention mechanism. Experiments demonstrate that this method significantly outperforms the benchmark model in complex scenarios, effectively addressing the issues of missed and false detections of fuzzy targets.
Downloads
References
[1] Feng D, Harakeh A, Waslander S L, et al. A review and comparative study on probabilistic object detection in autonomous driving[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(8): 9961-9980. DOI: https://doi.org/10.1109/TITS.2021.3096854
[2] Yao H, Liu Y, Li X, et al. A detection method for pavement cracks combining object detection and attention mechanism[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(11): 22179-22189. DOI: https://doi.org/10.1109/TITS.2022.3177210
[3] Waithe D, Brown J M, Reglinski K, et al. Object detection networks and augmented reality for cellular detection in fluorescence microscopy[J]. Journal of Cell Biology, 2020, 219(10): e201903166. DOI: https://doi.org/10.1083/jcb.201903166
[4] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969.
[5] Purwono P, Ma'arif A, Rahmaniar W, et al. Understanding of convolutional neural network (cnn): A review[J]. International Journal of Robotics and Control Systems, 2022, 2(4): 739-748. DOI: https://doi.org/10.31763/ijrcs.v2i4.888
[6] Li C, Li L, Jiang H, et al. YOLOv6: A single-stage object detection framework for industrial applications[J]. arXiv preprint arXiv:2209.02976, 2022.
[7] Ren J, Chen X, Liu J, et al. Accurate single stage detector using recurrent rolling convolution[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 5420-5428. DOI: https://doi.org/10.1109/CVPR.2017.87
[8] Liao G, Gao W, Jiang Q, et al. Mmnet: Multi-stage and multi-scale fusion network for rgb-d salient object detection[C]//Proceedings of the 28th ACM international conference on multimedia. 2020: 2436-2444. DOI: https://doi.org/10.1145/3394171.3413523
[9] Ouyang W, Luo P, Zeng X, et al. Deepid-net: multi-stage and deformable deep convolutional neural networks for object detection[J]. arXiv preprint arXiv:1409.3505, 2014. DOI: https://doi.org/10.1109/CVPR.2015.7298854
[10] Koonce B. ResNet 50[M]//Convolutional neural networks with swift for tensorflow: image recognition and dataset categorization. Berkeley, CA: Apress, 2021: 63-72. DOI: https://doi.org/10.1007/978-1-4842-6168-2_6
[11] Wu Minqi, Yang Yuanhua, Li Hang, etc Lightweight Underwater Small Target Detection Based on Graph Transformer and RT-DETR [J/OL]. Computer Applications, 1-12 [2026-02-16] https://link.cnki.net/urlid/51.1307.TP.20251030.1441.004.
[12] Lou M, Yu Y. Overlock: An overview-first-look-closely-next convnet with context-mixing dynamic kernels[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2025: 128-138. DOI: https://doi.org/10.1109/CVPR52734.2025.00021
[13] Sherstinsky A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network[J]. Physica d: Nonlinear phenomena, 2020, 404: 132306.
[14] Girshick R. Fast r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1440-1448. DOI: https://doi.org/10.1109/ICCV.2015.169
[15] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2016, 39(6): 1137-1149. DOI: https://doi.org/10.1109/TPAMI.2016.2577031
[16] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969. DOI: https://doi.org/10.1109/ICCV.2017.322
[17] Cai Z, Vasconcelos N. Cascade r-cnn: Delving into high quality object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 6154-6162. DOI: https://doi.org/10.1109/CVPR.2018.00644
[18] Cheng T, Wang X, Huang L, et al. Boundary-preserving mask r-cnn[C]//European conference on computer vision. Cham: Springer International Publishing, 2020: 660-676. DOI: https://doi.org/10.1007/978-3-030-58568-6_39
[19] Wu M, Yue H, Wang J, et al. Object detection based on RGC mask R‐CNN[J]. IET Image Processing, 2020, 14(8): 1502-1508. DOI: https://doi.org/10.1049/iet-ipr.2019.0057
[20] Lin K, Zhao H, Lv J, et al. Face Detection and Segmentation Based on Improved Mask R‐CNN[J]. Discrete dynamics in nature and society, 2020, 2020(1): 9242917. DOI: https://doi.org/10.1155/2020/9242917
[21] Sherstinsky A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network[J]. Physica d: Nonlinear phenomena, 2020, 404: 132306. DOI: https://doi.org/10.1016/j.physd.2019.132306
[22] Scarselli F, Gori M, Tsoi A C, et al. The graph neural network model[J]. IEEE transactions on neural networks, 2008, 20(1): 61-80. DOI: https://doi.org/10.1109/TNN.2008.2005605
[23] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.
[24] Chen Y, Liu S, Shen X, et al. Fast point r-cnn[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 9775-9784. DOI: https://doi.org/10.1109/ICCV.2019.00987
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Journal of Computing and Electronic Information Management

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.








