Autoscaling Stateful Microservices Under Variable Load and Traffic Uncertainty

Minjae Rhee; Jiesi Yang

doi:10.54097/kxfd4726

Authors

Minjae Rhee
Jiesi Yang

DOI:

https://doi.org/10.54097/kxfd4726

Keywords:

Autoscaling, stateful microservices, Cloud-native, Traffic uncertainty, Workload prediction, Kubernetes, Service mesh, Resource management

Abstract

Modern cloud-native applications increasingly adopt microservice architecture (MSA) to achieve modularity, independent deployment, and scalability. While stateless microservices have been extensively studied in the context of autoscaling, stateful microservices (SMS) present unique challenges due to their dependency on persistent state, session continuity, and data locality. Autoscaling SMS under variable load and traffic uncertainty requires sophisticated mechanisms that transcend conventional reactive approaches. This paper provides a comprehensive review of existing methodologies for autoscaling SMS, encompassing reactive, proactive, and hybrid scaling strategies. We examine the role of machine learning (ML) and deep learning (DL) in traffic forecasting and workload prediction, the management of distributed state during scaling events, and the integration of service mesh technologies to mitigate traffic uncertainty. Key challenges including cold-start latency, state migration overhead, and quality of service (QoS) degradation during scaling transitions are discussed in depth. We further review benchmark frameworks and evaluation methodologies used to assess autoscaling systems. The paper concludes by identifying critical open research problems and future directions in this rapidly evolving domain.

Downloads

Download data is not yet available.

References

[1] Di Francesco, P., Lago, P., & Malavolta, I. (2019). Architecting with microservices: A systematic mapping study. Journal of Systems and Software, 150, 77-97. DOI: https://doi.org/10.1016/j.jss.2019.01.001

[2] Imdoukh, M., Ahmad, I., & Alfailakawi, M. G. (2020). Machine learning-based auto-scaling for containerized applications. Neural Computing and Applications, 32(13), 9745-9760. DOI: https://doi.org/10.1007/s00521-019-04507-z

[3] Rzadca, K., Findeisen, P., Swiderski, J., Zych, P., Broniek, P., Kusmierek, J., ... & Wilkes, J. (2020, April). Autopilot: workload autoscaling at google. In proceedings of the fifteenth european conference on computer systems (pp. 1-16). DOI: https://doi.org/10.1145/3342195.3387524

[4] Harve, B. M., Bidkar, D. M., Krishnappa, M. S., Pandy, G., Jayaram, V., Veerapaneni, P. K., & Mehta, G. (2024, December). The cloud-native revolution: Microservices in a cloud-driven world. In 2024 International Conference on Intelligent Cybernetics Technology & Applications (ICICyTA) (pp. 1043-1048). IEEE. DOI: https://doi.org/10.1109/ICICYTA64807.2024.10913359

[5] Camilli, M., & Russo, B. (2022). Modeling Performance of Microservices Systems with Growth Theory: Modeling Performance of Microservices Systems with Growth Theory. Empirical Software Engineering, 27(2), 39. DOI: https://doi.org/10.1007/s10664-021-10088-0

[6] Hofmann, M. (2019). Developing a streaming-based architecture for demand prediction of taxi trips in the presence of concept drift.

[7] Sabuhi, M., Mahmoudi, N., & Khazaei, H. (2021). Optimizing the performance of containerized cloud software systems using adaptive PID controllers. ACM Transactions on Autonomous and Adaptive Systems (TAAS), 15(3), 1-27. DOI: https://doi.org/10.1145/3465630

[8] Balla, D., Simon, C., & Maliosz, M. (2020, April). Adaptive scaling of Kubernetes pods. In NOMS 2020-2020 IEEE/IFIP Network Operations and Management Symposium (pp. 1-5). IEEE. DOI: https://doi.org/10.1109/NOMS47738.2020.9110428

[9] Yadav, M. P., Pal, N., & Yadav, D. K. (2021, January). Workload prediction over cloud server using time series data. In 2021 11th international conference on cloud computing, data science & engineering (confluence) (pp. 267-272). IEEE. DOI: https://doi.org/10.1109/Confluence51648.2021.9377032

[10] Karim, M. E., Maswood, M. M. S., Das, S., & Alharbi, A. G. (2021). BHyPreC: a novel Bi-LSTM based hybrid recurrent neural network model to predict the CPU workload of cloud virtual machine. IEEE Access, 9, 131476-131495. DOI: https://doi.org/10.1109/ACCESS.2021.3113714

[11] Xing, S., & Wang, Y. (2025). Cross-Modal Attention Networks for Multi-Modal Anomaly Detection in System Software. IEEE Open Journal of the Computer Society. DOI: https://doi.org/10.1109/OJCS.2025.3607975

[12] Rossi, F., Nardelli, M., & Cardellini, V. (2019, July). Horizontal and vertical scaling of container-based applications using reinforcement learning. In 2019 IEEE 12th International Conference on Cloud Computing (CLOUD) (pp. 329-338). IEEE. DOI: https://doi.org/10.1109/CLOUD.2019.00061

[13] Singh, P., Gupta, P., Jyoti, K., & Nayyar, A. (2019). Research on auto-scaling of web applications in cloud: survey, trends and future directions. Scalable Computing: Practice and Experience, 20(2), 399-432. DOI: https://doi.org/10.12694/scpe.v20i2.1537

[14] Russo Russo, G., Cardellini, V., & Lo Presti, F. (2023). Hierarchical auto-scaling policies for data stream processing on heterogeneous resources. ACM transactions on autonomous and adaptive systems, 18(4), 1-44. DOI: https://doi.org/10.1145/3597435

[15] Ma, X., Zong, K., & Rezaeipanah, A. (2024). Auto-scaling and computation offloading in edge/cloud computing: a fuzzy Q-learning-based approach. Wireless Networks, 30(2), 637-648. DOI: https://doi.org/10.1007/s11276-023-03486-3

[16] Toka, L., Dobreff, G., Fodor, B., & Sonkoly, B. (2020, May). Adaptive AI-based auto-scaling for Kubernetes. In 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID) (pp. 599-608). IEEE. DOI: https://doi.org/10.1109/CCGrid49817.2020.00-33

[17] Varshney, R. P., & Sharma, D. K. (2020, October). Cold start in function as a service: a systematic study, analysis and evaluation. In International Conference on Futuristic Trends in Networks and Computing Technologies (pp. 337-349). Singapore: Springer Singapore. DOI: https://doi.org/10.1007/978-981-16-1480-4_30

[18] Mahmoudi, N., & Khazaei, H. (2020). Performance modeling of serverless computing platforms. IEEE Transactions on Cloud Computing, 10(4), 2834-2847. DOI: https://doi.org/10.1109/TCC.2020.3033373

[19] Soldani, J., Forti, S., Roveroni, L., & Brogi, A. (2025). Explaining Microservices' Cascading Failures From Their Logs. Software: Practice and Experience, 55(5), 809-828. DOI: https://doi.org/10.1002/spe.3400

[20] Lima, M., Neto, M., Silva Filho, T., & Fagundes, R. A. D. A. (2022). Learning under concept drift for regression—a systematic literature review. IEEE Access, 10, 45410-45429. DOI: https://doi.org/10.1109/ACCESS.2022.3169785

[21] Bawa, J., Kaur Chahal, K., & Kaur, K. (2025). Improving cloud resource management: an ensemble learning approach for workload prediction: J. Bawa et al. The Journal of Supercomputing, 81(10), 1138. DOI: https://doi.org/10.1007/s11227-025-07560-9

[22] Fu, K., Zhang, W., Chen, Q., Zeng, D., & Guo, M. (2021). Adaptive resource efficient microservice deployment in cloud-edge continuum. IEEE Transactions on Parallel and Distributed Systems, 33(8), 1825-1840. DOI: https://doi.org/10.1109/TPDS.2021.3128037

[23] Baarzi, A. F., & Kesidis, G. (2021, November). Showar: Right-sizing and efficient scheduling of microservices. In Proceedings of the ACM Symposium on Cloud Computing (pp. 427-441). DOI: https://doi.org/10.1145/3472883.3486999

[24] Gan, Y., Zhang, Y., Cheng, D., Shetty, A., Rathi, P., Katarki, N., ... & Delimitrou, C. (2019, April). An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems. In Proceedings of the twenty-fourth international conference on architectural support for programming languages and operating systems (pp. 3-18). DOI: https://doi.org/10.1145/3297858.3304013

[25] Li, B., Peng, X., Xiang, Q., Wang, H., Xie, T., Sun, J., & Liu, X. (2022). Enjoy your observability: an industrial survey of microservice tracing and analysis. Empirical Software Engineering, 27(1), 25. DOI: https://doi.org/10.1007/s10664-021-10063-9

[26] Horn, A., Fard, H. M., & Wolf, F. (2022, August). Multi-objective hybrid autoscaling of microservices in kubernetes clusters. In European Conference on Parallel Processing (pp. 233-250). Cham: Springer International Publishing. DOI: https://doi.org/10.1007/978-3-031-12597-3_15

[27] Söylemez, M., Tekinerdogan, B., & Kolukısa Tarhan, A. (2022). Challenges and solution directions of microservice architectures: A systematic literature review. Applied sciences, 12(11), 5507. DOI: https://doi.org/10.3390/app12115507

[28] Aqasizade, H., Ataie, E., & Bastam, M. (2025). Kubernetes in action: Exploring the performance of kubernetes distributions in the cloud. Software: Practice and Experience, 55(10), 1711-1725. DOI: https://doi.org/10.1002/spe.70000

[29] Ziegler, T., Bernstein, P. A., Leis, V., & Binnig, C. (2023). Is scalable OLTP in the cloud a solved problem?. In CIDR.

[30] Wen, L., Xu, M., Gill, S. S., Hilman, M., Srirama, S. N., Ye, K., & Xu, C. (2025). Statuscale: Status-aware and elastic scaling strategy for microservice applications. ACM Transactions on Autonomous and Adaptive Systems, 20(1), 1-25. DOI: https://doi.org/10.1145/3686253

[31] Chen, Y., Wu, C., Zhang, F., Lu, C., Huang, Y., & Lu, H. (2025). Topology-aware microservice architecture in edge networks: Deployment optimization and implementation. IEEE Transactions on Mobile Computing. DOI: https://doi.org/10.1109/TMC.2025.3539312

[32] Ghandeharizadeh, S., Bernstein, P. A., Borthakur, D., Huang, H., Menon, J., & Puri, S. (2022, September). Disaggregated database management systems. In Technology Conference on Performance Evaluation and Benchmarking (pp. 33-48). Cham: Springer Nature Switzerland. DOI: https://doi.org/10.1007/978-3-031-29576-8_3

[33] Alharthi, S., Alshamsi, A., Alseiari, A., & Alwarafy, A. (2024). Auto-scaling techniques in cloud computing: Issues and research directions. Sensors, 24(17), 5551. DOI: https://doi.org/10.3390/s24175551

[34] Nguyen, T. T., Yeom, Y. J., Kim, T., Park, D. H., & Kim, S. (2020). Horizontal pod autoscaling in kubernetes for elastic container orchestration. Sensors, 20(16), 4621. DOI: https://doi.org/10.3390/s20164621

[35] Luo, S., Xu, H., Ye, K., Xu, G., Zhang, L., Yang, G., & Xu, C. (2022, November). The power of prediction: microservice auto scaling via workload learning. In Proceedings of the 13th symposium on cloud computing (pp. 355-369). DOI: https://doi.org/10.1145/3542929.3563477

[36] Qiu, H., Banerjee, S. S., Jha, S., Kalbarczyk, Z. T., & Iyer, R. K. (2020). {FIRM}: An intelligent fine-grained resource management framework for {SLO-Oriented} microservices. In 14th USENIX symposium on operating systems design and implementation (OSDI 20) (pp. 805-825).

[37] Gupta, S., Islam, M. T., & Buyya, R. (2025). A Hybrid Reactive-Proactive Auto-scaling Algorithm for SLA-Constrained Edge Computing. arXiv preprint arXiv:2512.14290.

[38] Zou, Y., Qi, N., Deng, Y., Xue, Z., Gong, M., & Zhang, W. (2025, July). Autonomous resource management in microservice systems via reinforcement learning. In 2025 8th International Conference on Computer Information Science and Application Technology (CISAT) (pp. 991-995). IEEE. DOI: https://doi.org/10.1109/CISAT66811.2025.11181794

[39] Wang, Z., Zhu, S., Li, J., Jiang, W., Ramakrishnan, K. K., Zheng, Y., ... & Liu, A. X. (2022, November). Deepscaling: microservices autoscaling for stable cpu utilization in large scale cloud systems. In Proceedings of the 13th symposium on cloud computing (pp. 16-30). DOI: https://doi.org/10.1145/3542929.3563469

[40] Zhou, Y. (2025). Adaptive Resource Scheduling for IoT Big Data Stream Processing Based on Deep Reinforcement Learning. Computers and Artificial Intelligence, 2(2), 16-28. DOI: https://doi.org/10.70267/cai.25v2n2.1628

[41] Mayerhofer, R. (2023). Reinforcement-learning-based, application-agnostic, and explainable auto-scaling in the cloud utilizing high-level SLOs (Doctoral dissertation, Technische Universität Wien).

[42] Joshi, N. S., Raghuwanshi, R., Agarwal, Y. M., Annappa, B., & Sachin, D. N. (2024). ARIMA-PID: container auto scaling based on predictive analysis and control theory. Multimedia Tools and Applications, 83(9), 26369-26386. DOI: https://doi.org/10.1007/s11042-023-16587-0

[43] Luo, S., Xu, H., Lu, C., Ye, K., Xu, G., Zhang, L., ... & Xu, C. (2021, November). Characterizing microservice dependency and performance: Alibaba trace analysis. In Proceedings of the ACM symposium on cloud computing (pp. 412-426). DOI: https://doi.org/10.1145/3472883.3487003

[44] Ogundipe, A., Okunlola, O., & Alao, O. (2024). Adaptive Load Balancing and Auto Scaling Algorithms for Resource Optimization in Distributed Microservices based Cloud Applications. International Journal of Science Architecture Technology and Environment, 101-111. DOI: https://doi.org/10.63680/ijsate0524111.06

[45] Bhat, C. R., Prabha, B., Donald, C., Sah, S., & Patil, H. (2023, December). SARIMA Techniques for Predictive Resource Provisioning in Cloud Environments. In 2023 Intelligent Computing and Control for Engineering and Business Systems (ICCEBS) (pp. 1-5). IEEE. DOI: https://doi.org/10.1109/ICCEBS58601.2023.10449163

[46] Bankole, F. A., & Tewogbade, L. (2024). Optimizing subscription cost structures in technology enterprises using scalable, data-informed forecasting techniques. International Journal of Scientific Engineering Research and Science Education and Technology, 11(6), 359-392.

[47] Nwachukwu, C. Real-Time Workload Prediction and Dynamic Autoscaling: A New Paradigm for Cloud Performance Management.

[48] Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., & Zhang, W. (2021, May). Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence (Vol. 35, No. 12, pp. 11106-11115). DOI: https://doi.org/10.1609/aaai.v35i12.17325

[49] Tamiru, M. A. (2021). Automatic resource management in geo-distributed multi-cluster environments (Doctoral dissertation, Université de Rennes).

[50] Kumar, B., Verma, A., & Verma, P. (2025). A multivariate transformer-based monitor-analyze-plan-execute (MAPE) autoscaling framework for dynamic resource allocation in cloud environment. Computing, 107(3), 69. DOI: https://doi.org/10.1007/s00607-025-01426-x

[51] Sun, Y., Keung, J. W., Yu, H. K., & Luo, W. (2026). LogMeta: A Few-Shot Model-Agnostic Meta-Learning Framework for Robust and Adaptive Log Anomaly Detection. Journal of Systems and Software, 112781. DOI: https://doi.org/10.1016/j.jss.2026.112781

[52] Hang, H., Tang, X., Sun, J., Bao, L., Lo, D., & Wang, H. (2024, May). Robust auto-scaling with probabilistic workload forecasting for cloud databases. In 2024 IEEE 40th International Conference on Data Engineering (ICDE) (pp. 4016-4029). IEEE. DOI: https://doi.org/10.1109/ICDE60146.2024.00308

[53] Liu, Y., Ren, S., Wang, X., & Zhou, M. (2024). Temporal logical attention network for log-based anomaly detection in distributed systems. Sensors, 24(24), 7949. DOI: https://doi.org/10.3390/s24247949

[54] Fu, N., Cheng, G., Teng, Y., Dai, G., Yu, S., & Chen, Z. (2025). Intelligent root cause localization in microservice systems: A survey and new perspectives. ACM Computing Surveys, 57(12), 1-37. DOI: https://doi.org/10.1145/3736755

[55] Golec, M., Walia, G. K., Kumar, M., Cuadrado, F., Gill, S. S., & Uhlig, S. (2024). Cold start latency in serverless computing: A systematic review, taxonomy, and future directions. ACM Computing Surveys, 57(3), 1-36. DOI: https://doi.org/10.1145/3700875

[56] Ahsan, S. B. (2020). New consistency orchestrators for emerging distributed systems (Doctoral dissertation, University of Illinois at Urbana-Champaign).

[57] Xu, M., Wen, L., Liao, J., Wu, H., Ye, K., & Xu, C. (2025). Auto-scaling Approaches for Cloud-native Applications: A Survey and Taxonomy. arXiv preprint arXiv:2507.17128.

[58] Laaber, C., Scheuner, J., & Leitner, P. (2019). Software microbenchmarking in the cloud. how bad is it really?. Empirical Software Engineering, 24(4), 2469-2508. DOI: https://doi.org/10.1007/s10664-019-09681-1

[59] Guo, J., Chang, Z., Wang, S., Ding, H., Feng, Y., Mao, L., & Bao, Y. (2019, June). Who limits the resource efficiency of my datacenter: An analysis of alibaba datacenter traces. In Proceedings of the international symposium on quality of service (pp. 1-10). DOI: https://doi.org/10.1145/3326285.3329074

[60] Dashtbani, M., & Tahvildari, L. (2025, November). Key Considerations for Auto-Scaling: Lessons from Benchmark Microservices. In 2025 IEEE International Conference on Collaborative Advances in Software and COmputiNg (CASCON) (pp. 118-123). IEEE. DOI: https://doi.org/10.1109/CASCON66301.2025.00034

[61] Pi, A., Zhao, J., Wang, S., & Zhou, X. (2021, December). Memory at your service: Fast memory allocation for latency-critical services. In Proceedings of the 22nd International Middleware Conference (pp. 185-197). DOI: https://doi.org/10.1145/3464298.3493394

Autoscaling Stateful Microservices Under Variable Load and Traffic Uncertainty

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Cover

Indexing & Abstracting