A multi-season machine learning approach to examine the training load and injury relationship in professional soccer

Authors: Majumdar, A., Bakirov, R., Hodges, D., McCullagh, S. and Rees, T.

Journal: JOURNAL OF SPORTS ANALYTICS

Volume: 10

Issue: 1

Pages: 47-65

eISSN: 2215-0218

ISSN: 2215-020X

DOI: 10.3233/JSA-240718

https://eprints.bournemouth.ac.uk/39595/

Source: Web of Science (Lite)

A multi-season machine learning approach to examine the training load and injury relationship in professional soccer

Authors: Majumdar, A., Bakirov, R., Hodges, D., McCullough, S. and Rees, T.

Journal: Journal of Sports Analytics

Volume: 10

Issue: 1

Pages: 47-65

DOI: 10.3233/JSA-240718

https://eprints.bournemouth.ac.uk/39595/

Source: Manual

A multi-season machine learning approach to examine the training load and injury relationship in professional soccer

Authors: Majumdar, A., Bakirov, R., Hodges, D., McCullough, S. and Rees, T.

Journal: Journal of Sports Analytics

Volume: 10

Issue: 1

Pages: 47-65

ISSN: 2215-020X

Abstract:

OBJECTIVES: The purpose of this study was to use machine learning to examine the relationship between training load and soccer injury with a multi-season dataset from one English Premier League club.

METHODS: Participants were 35 male professional soccer players (aged 25.79±3.75 years, range 18–37 years; height 1.80±0.07 m, range 1.63–1.95 m; weight 80.70±6.78 kg, range 66.03–93.70 kg), with data collected from the 2014–2015 season until the 2018–2019 season. A total of 106 training loads variables (40 GPS data, 6 personal information, 14 physical data, 4 psychological data and 14 ACWR, 14 MSWR and 14 EWMA data) were examined in relation to 133 non-contact injuries, with a high imbalance ratio of 0.013.

RESULTS: XGBoost and Artificial Neural Network were implemented to train the machine learning models using four and a half seasons’ data, with the developed models subsequently tested on the following half season’s data. During the first four and a half seasons, there were 341 injuries; during the next half season there were 37 injuries. To interpret and visualize the output of each model and the contribution of each feature (i.e., training load) towards the model, we used the Shapley Additive Explanations (SHAP) approach. Of 37 injuries, XGBoost correctly predicted 26 injuries, with recall and precision of 73% and 10% respectively. Artificial Neural Network correctly predicted 28 injuries, with recall and precision of 77% and 13% respectively. In the model using Artificial Neural Network (the relatively more accurate model), last injury area and weight appeared to be the most important features contributing to the prediction of injury.

CONCLUSIONS: This was the first study of its kind to use Artificial Neural Network and a multi-season dataset for injury prediction. Our results demonstrate the potential to predict injuries with high recall, thereby identifying most of the injury cases, albeit, due to high class imbalance, precision suffered. This approach to using machine learning provides potentially valuable insights for soccer organizations and practitioners when monitoring load injuries.

https://eprints.bournemouth.ac.uk/39595/

Source: BURO EPrints