Research on Adaptive Audio Coding Optimization Algorithm Based on Multi-attribute Decision Making and Machine Learning

Wenyi Jiang; Ting Li; Kongju Zhao

doi:10.54097/p38e7966

Authors

Wenyi Jiang
Ting Li
Kongju Zhao

DOI:

https://doi.org/10.54097/p38e7966

Keywords:

Audio processing, Spectrum analysis, Fourier transform

Abstract

In order to solve the balance between storage efficiency, sound quality fidelity and encoding complexity in digital audio processing, an adaptive audio coding optimization algorithm integrating multi-attribute decision-making and machine learning is proposed. In the first step, the weights of each evaluation index are determined by projection tracing method, and a comprehensive evaluation system including file size, sound fidelity, codec complexity and scene applicability is constructed. In the second step, a multivariate response analysis framework of audio parameters is established, which reveals the nonlinear effects of sampling rate, bit depth and other parameters on file size and sound quality, and a cost-effective optimization model is designed to recommend the optimal parameter combination for different application scenarios. In the third step, an adaptive coding model based on two-stage classification-parameterization is developed, which achieves 96.8% audio type recognition accuracy by extracting 37 time-frequency features and combining support vector machine and random forest classifier, and dynamically selects coding parameters according to spectral characteristics. Experimental results show that the proposed algorithm significantly improves the storage efficiency while ensuring sound quality, providing an effective solution for digital audio processing.

Downloads

Download data is not yet available.

References

[1] Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Prentice Hall.

[2] Painter, T., & Spanias, A. (2000). Perceptual coding of digital audio. Proceedings of the IEEE, 88(4), 451-515.

[3] Brandenburg, K. (1999). MP3 and AAC explained. In AES 17th International Conference on High Quality Audio Coding.

[4] Jayant, N. S., & Noll, P. (1984). Digital coding of waveforms: principles and applications to speech and video. Prentice Hall.

[5] Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.

[6] Haykin, S. (1999). Neural networks: a comprehensive foundation. Prentice Hall.

[7] Vetterli, M., & Kovačević, J. (1995). Wavelets and subband coding. Prentice Hall.

[8] Schroeder, M. R., & Atal, B. S. (1985). Code-excited linear prediction (CELP): High-quality speech at very low bit rates. In IEEE International Conference on Acoustics, Speech, and Signal Processing.