OverLoCK-GPH: A Bio-Inspired Object Detector with Graph-Prior Modulation and Hybrid Instance Refinement
DOI:
https://doi.org/10.54097/0cetjv49Keywords:
Object Detection, Mask R-CNN, OverLoCK, Prior-Guided Modulation, Graph Attention, Hybrid BBox HeadAbstract
MASK R-CNN is a visual model based on convolutional neural networks and applied to object detection. In the Mask R-CNN architecture, the Backbone typically employs ResNet. Through continuous convolution and downsampling, it extracts texture and semantic features of the image equally layer by layer, resulting in a large amount of background noise being mistaken for useful information, which interferes with the localization of the target. In addition, the Neck adopts a simple top-down additive fusion. This fusion is static and linear, and is limited by the local receptive field of the convolutional kernel, resulting in a lack of spatial relationships in FPN, incomplete object detection, and inaccurate localization. This paper proposes an enhanced detection framework named OverLoCK-GPH. Firstly, we utilize the Overview-Net of OverLoCK to generate a global context prior, and inject it into the features at each level through a novel prior-guided feature pyramid network, achieving dynamic weight modulation in space. Secondly, we introduce the Graph Attention Block at the high-level feature extraction stage, which captures long-range semantic dependencies by modeling pixels as graph nodes. Finally, we designed a Hybrid Instance Refinement Head for detection, which suppresses background noise at the ROI level through a channel attention mechanism. Experiments demonstrate that this method significantly outperforms the benchmark model in complex scenarios, effectively addressing the issues of missed and false detections of fuzzy targets.
Downloads
References
[1] Feng D, Harakeh A, Waslander S L, et al. A review and comparative study on probabilistic object detection in autonomous driving[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(8): 9961-9980.
[2] Yao H, Liu Y, Li X, et al. A detection method for pavement cracks combining object detection and attention mechanism[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(11): 22179-22189.
[3] Waithe D, Brown J M, Reglinski K, et al. Object detection networks and augmented reality for cellular detection in fluorescence microscopy[J]. Journal of Cell Biology, 2020, 219(10): e201903166.
[4] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969.
[5] Purwono P, Ma'arif A, Rahmaniar W, et al. Understanding of convolutional neural network (cnn): A review[J]. International Journal of Robotics and Control Systems, 2022, 2(4): 739-748.
[6] Li C, Li L, Jiang H, et al. YOLOv6: A single-stage object detection framework for industrial applications[J]. arXiv preprint arXiv:2209.02976, 2022.
[7] Ren J, Chen X, Liu J, et al. Accurate single stage detector using recurrent rolling convolution[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 5420-5428.
[8] Liao G, Gao W, Jiang Q, et al. Mmnet: Multi-stage and multi-scale fusion network for rgb-d salient object detection[C]//Proceedings of the 28th ACM international conference on multimedia. 2020: 2436-2444.
[9] Ouyang W, Luo P, Zeng X, et al. Deepid-net: multi-stage and deformable deep convolutional neural networks for object detection[J]. arXiv preprint arXiv:1409.3505, 2014.
[10] Koonce B. ResNet 50[M]//Convolutional neural networks with swift for tensorflow: image recognition and dataset categorization. Berkeley, CA: Apress, 2021: 63-72.
[11] Wu Minqi, Yang Yuanhua, Li Hang, etc Lightweight Underwater Small Target Detection Based on Graph Transformer and RT-DETR [J/OL]. Computer Applications, 1-12 [2026-02-16] https://link.cnki.net/urlid/51.1307.TP.20251030.1441.004.
[12] Lou M, Yu Y. Overlock: An overview-first-look-closely-next convnet with context-mixing dynamic kernels[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2025: 128-138.
[13] Sherstinsky A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network[J]. Physica d: Nonlinear phenomena, 2020, 404: 132306.
[14] Girshick R. Fast r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1440-1448.
[15] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2016, 39(6): 1137-1149.
[16] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969.
[17] Cai Z, Vasconcelos N. Cascade r-cnn: Delving into high quality object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 6154-6162.
[18] Cheng T, Wang X, Huang L, et al. Boundary-preserving mask r-cnn[C]//European conference on computer vision. Cham: Springer International Publishing, 2020: 660-676.
[19] Wu M, Yue H, Wang J, et al. Object detection based on RGC mask R‐CNN[J]. IET Image Processing, 2020, 14(8): 1502-1508.
[20] Lin K, Zhao H, Lv J, et al. Face Detection and Segmentation Based on Improved Mask R‐CNN[J]. Discrete dynamics in nature and society, 2020, 2020(1): 9242917.
[21] Sherstinsky A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network[J]. Physica d: Nonlinear phenomena, 2020, 404: 132306.
[22] Scarselli F, Gori M, Tsoi A C, et al. The graph neural network model[J]. IEEE transactions on neural networks, 2008, 20(1): 61-80.
[23] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.
[24] Chen Y, Liu S, Shen X, et al. Fast point r-cnn[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 9775-9784.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Journal of Computing and Electronic Information Management

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.








