Data Structure Trend Prediction Based on XGBoost and DFR Integration
DOI:
https://doi.org/10.54097/jp7aae88Keywords:
XGBoost, DFR Model, Event ImportanceAbstract
This study proposes a predictive modeling framework based on XGBoost and ensemble learning techniques to forecast national outcomes and structural trends in large-scale competition datasets. First, a multi-dimensional feature system incorporating static, dynamic, and interaction variables is constructed to characterize country-specific attributes. Based on this, an XGBoost-based nonlinear regression model is developed to estimate both gold and total medal counts, while integrating host country effects and event-scale adjustments. Next, to address prediction challenges for countries with sparse historical performance, a hierarchical clustering strategy is implemented to identify potential first-time medal winners and assess rank shifts. Finally, a hybrid Difference-in-Differences and Random Forest Regression (DFR) model is introduced to evaluate the quantitative effect of external interventions, with a focus on marginal contribution estimation and residual analysis. Through these methods, the framework achieves improved predictive accuracy, enhanced interpretability of event-level influence, and a refined understanding of competition structure dynamics.
Downloads
References
[1] Zhang, H., Wang, Y., & Qian, H. (2021). Limitations of linear models in high-dimensional prediction tasks. Applied Intelligence, 51(6), 3795–3809.
[2] Bekkerman, R., Bilenko, M., & Langford, J. (2011). Scaling up machine learning: Parallel and distributed approaches. Cambridge University Press.
[3] Dietterich, T. G. (2000). Ensemble methods in machine learning. In International Workshop on Multiple Classifier Systems (pp. 1–15). Springer.
[4] Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD (pp. 785–794).
[5] Witten, D. M., & Tibshirani, R. (2010). A framework for feature selection in clustering. Journal of the American Statistical Association, 105(490), 713–726.
[6] Abadie, A. (2005). Semiparametric difference-in-differences estimators. The Review of Economic Studies, 72(1), 1–19.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Journal of Computing and Electronic Information Management

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.








