Fish Feeding Behavior Recognition Based on Enhanced MobileViTv3 Model

Wei Long; Jintao Zhang; Linhua Jiang; Yuanyuan Yang; Yuwei Tang; Lingxi Hu

doi:10.54097/cacjn071

Authors

Wei Long
Jintao Zhang
Linhua Jiang
Yuanyuan Yang
Yuwei Tang
Lingxi Hu

DOI:

https://doi.org/10.54097/cacjn071

Keywords:

Fish feeding behavior recongnition, Multi-feature extraction, Attention mechanism, MobileViTv3

Abstract

Video stream-based fish feeding behavior recognition has garnered significant attention in recent years, accelerating the optimization of feeding strategies and enhancing aquaculture efficiency. However, current feeding intensity assessment methods suffer from inefficiency and subjectivity in manual observation, compounded by challenges in accurately extracting behavioral features due to high mobility and random movement patterns of outdoor-cultured fish. Constructing an efficient multi-feature extraction model for fish feeding recognition—particularly deployable on mobile and edge devices—remains a critical challenge. To address these limitations, this paper proposes a multi - feature extraction network based on improved MobileViT V3,which uses video streams as input and solves problems of large model size, high computational complexity, and insufficient feature extraction in current models, integrating three key innovations: (1) A Multi-Scale Convolution Module (MSCM) that concurrently captures spatiotemporal, motion, and channel features from video streams; (2) A Feature Fusion Convolutional Block Attention Module (FCBAM) combining shallow-deep features with adaptive attention weighting; (3) A BiasLoss function with dynamic scaling to address intra-class variation and low-quality data. Evaluated on grass carp and crucian carp, our model achieves 97.7% accuracy in feeding intensity classification with only (5.8) M parameters, outperforming C3D-ConvLSTM and MobileNetv3-small baselines while demonstrating enhanced robustness for edge deployment.

Downloads

Download data is not yet available.

References

[1] Xu C, Liu Y, Pei Z. Research on Legal Risk Identification, Causes and Remedies for Prevention and Control in China’s Aquaculture Industry[J]. Fishes, 2023, 8(11): 537.

[2] Agriculture Organization of the United Nations. Fisheries Department. The state of world fisheries and aquaculture[M]. Food and Agriculture Organization of the United Nations, 2018.

[3] Li D, Wang Z, Wu S, et al. Automatic recognition methods of fish feeding behavior in aquaculture: A review[J]. Aquaculture, 2020, 528: 735508.

[4] Li D, Wang G, Du L, et al. Recent advances in intelligent recognition methods for fish stress behavior[J]. Aquacultural Engineering, 2022, 96: 102222.

[5] Li Y, Ji B, Shi X, et al. Tea: Temporal excitation and aggregation for action recognition[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 909-918.

[6] Hu X, Liu Y, Zhao Z, et al. Real-time detection of uneaten feed pellets in underwater images for aquaculture using an improved YOLO-V4 network[J]. Computers and electronics in agriculture, 2021, 185: 106135.

[7] Hou S, Liu J, Wang Y, et al. Research on fish bait particles counting model based on improved MCNN[J]. Computers and Electronics in Agriculture, 2022, 196: 106858.

[8] Dauda A B, Ajadi A, Tola-Fabunmi A S, et al. Waste production in aquaculture: Sources, components and managements in different culture systems[J]. Aquaculture and Fisheries, 2019, 4(3): 81-88.

[9] Ye Z, Zhao J, Han Z, et al. Behavioral characteristics and statistics-based imaging techniques in the assessment and optimization of tilapia feeding in a recirculating aquaculture system[J]. Transactions of the ASABE, 2016, 59(1): 345-355.

[10] Zhou C, Xu D, Chen L, et al. Evaluation of fish feeding intensity in aquaculture using a convolutional neural network and machine vision[J]. Aquaculture, 2019, 507: 457-465.

[11] Yang L, Yu H, Cheng Y, et al. A dual attention network based on efficientNet-B2 for short-term fish school feeding behavior analysis in aquaculture[J]. Computers and Electronics in Agriculture, 2021, 187: 106316.

[12] Zhang J L, Xu L H, Liu S J. Classification of Atlantic salmon feeding behavior based on underwater machine vision[J]. Transactions of the Chinese Society of Agricultural Engineering, 2020, 36(13): 158-164.

[13] Zhu M, Zhang Z, Huang H, et al. Classification of perch ingesting condition using lightweight neural network MobileNetV3-Small[J]. Trans. Chin. Soc. Agric. Eng., 2021, 37(19): 165-172.

[14] Hu W C, Chen L B, Huang B K, et al. A computer vision-based intelligent fish feeding system using deep learning techniques for aquaculture[J]. IEEE Sensors Journal, 2022, 22(7): 7185-7194.

[15] Tang M, Wu H. Lightweight insulator defect detection algorithm based on improved YOLOv8[C]//Proceedings of the 2024 3rd International Conference on Cyber Security, Artificial Intelligence and Digital Economy. 2024: 197-201.

[16] Måløy H, Aamodt A, Misimi E. A spatio-temporal recurrent network for salmon feeding action recognition from underwater videos in aquaculture[J]. Computers and Electronics in Agriculture, 2019, 167: 105087.

[17] Wadekar S N, Chaurasia A. Mobilevitv3: Mobile-friendly vision transformer with simple and effective fusion of local, global and input features[J]. arXiv preprint arXiv:2209.15159, 2022.

[18] Zeng Y, Yang X, Pan L, et al. Fish school feeding behavior quantification using acoustic signal and improved Swin Transformer[J]. Computers and Electronics in Agriculture, 2023, 204: 107580.

[19] Zhou Y, Chen S, Wang Y, et al. Review of research on lightweight convolutional neural networks[C]//2020 IEEE 5th Information Technology and Mechatronics Engineering Conference (ITOEC). IEEE, 2020: 1713-1720.

[20] Mehta S, Rastegari M. Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer[J]. arXiv preprint arXiv:2110.02178, 2021.

[21] Dosovitskiy A. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.

[22] Wadekar S N, Chaurasia A. Mobilevitv3: Mobile-friendly vision transformer with simple and effective fusion of local, global and input features[J]. arXiv preprint arXiv:2209.15159, 2022.

[23] Dosovitskiy A. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.

[24] Abrahamyan L, Ziatchin V, Chen Y, et al. Bias loss for mobile neural networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 6556-6566.

[25] Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 3-19.

[26] Ma J, Kong D, Wu F, et al. Densely connected convolutional networks for ultrasound image based lesion segmentation[J]. Computers in Biology and Medicine, 2024, 168: 107725.

[27] Peng Y, Wu W, Ren J, et al. Novel GCN model using dense connection and attention mechanism for text classification[J]. Neural Processing Letters, 2024, 56(2): 144.

[28] Goyal P. Accurate, large minibatch SG D: training imagenet in 1 hour[J]. arXiv preprint arXiv:1706.02677, 2017.

[29] Loshchilov I, Hutter F. Sgdr: Stochastic gradient descent with warm restarts[J]. arXiv preprint arXiv:1608.03983, 2016.

[30] Lin T. Focal Loss for Dense Object Detection[J]. arXiv preprint arXiv:1708.02002, 2017.

[31] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2117-2125.

[32] Wen Y, Zhang K, Li Z, et al. A discriminative feature learning approach for deep face recognition[C]//Computer vision–ECCV 2016: 14th European conference, amsterdam, the netherlands, October 11–14, 2016, proceedings, part VII 14. Springer International Publishing, 2016: 499-515.

[33] Loshchilov I. Decoupled weight decay regularization[J]. arXiv preprint arXiv:1711.05101, 2017.

[34] Wang Z, She Q, Smolic A. Action-net: Multipath excitation for action recognition[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 13214-13223.

[35] Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 4510-4520.

[36] Qiao Y, Guo Y, Yu K, et al. C3D-ConvLSTM based cow behaviour classification using video data for precision livestock farming[J]. Computers and electronics in agriculture, 2022, 193: 106650.

[37] Yu C, Ding Q, Bai Y. SBCP-YOLO-R3D: Student Behavior Recognition and Visualization Framework Using Improved YOLO and R3D for Class Video[J]. Journal of Artificial Intelligence and Technology, 2025.

[38] Behar N, Shrivastava M. ResNet50-Based Effective Model for Breast Cancer Classification Using Histopathology Images[J]. CMES-Computer Modeling in Engineering & Sciences, 2022, 130(2).

[39] Research on efficient classification algorithm for coal and gangue based on improved MobilenetV3-small

[40] SHI Biao, LI Yu Xia, YU Xhua, YAN Wang. Short-term load forecasting based on modified particle swarm optimizer and fuzzy neural network model. Systems Engineering-Theory and Practice, 2010, 30(1): 158-160.

Fish Feeding Behavior Recognition Based on Enhanced MobileViTv3 Model

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Cover

Indexing & Abstracting