Explainable xgboost framework for multi-class disease severity prediction: a clinical machine learning study with shap-based interpretability
Keywords:
Disease Severity Prediction, XGBoost, SHAP, Machine Learning, Clinical Decision Support, Class Imbalance.Abstract
Non-communicable diseases (NCDs) are a major growing problem, and responsible for around 74% of all deaths globally. Early and correct stratification of the severity of the disease is essential for timely therapeutic interventions, efficient use of clinical resources and patient benefits. Traditional Clinical Scoring Systems (CSSs) like APACHE II and SOFA are based on handcrafted and limited features and miss the complex and non-linear interactions between variables in multi-morbid patients. In this study, a novel and interpretable machine learning (ML) pipeline is developed based on extreme gradient boosting (XGBoost) model for four class disease severity prediction (Mild, Moderate, Severe, Critical). A retrospective multi-centre clinical dataset of 12450 patient records was collected between 2018-2024 from three tertiary care hospitals of Maharashtra, india and de-identified for analysis. Data preprocessing involved multiple imputation by chained equations (MICE) for missing value imputation, target encoding for categorical features and robust scaler normalization and synthetic minority oversampling technique (SMOTE) for handling class imbalance. The hyper parameters of XGBoost were tuned using the Optuna framework with a Bayesian Optimization algorithm over 200 trials that have been performed with stratified 5-fold CV. In comparison to four baseline classifiers, the proposed XGBoost model obtained the accuracy of 93.6%, precision of 92.8%, recall of 93.1%, F1-score of 92.9%, and an AUC-ROC of 0.971, which is significantly better than that of Logistic Regression (78.4%), Random Forest (83.1%), Support Vector Machine (80.7%), and Multilayer Perceptron (85.3%). In this case, the Expected Calibration Error (ECE) was 0.042, which verified that the probabilistic results were well calibrated. SHAP analysis revealed that blood glucose level, age, and BMI were the top three most discriminative clinical features; HbA1c was a significant fourth feature. Patient-level SHAP explanations filled in the transparency gap between clinical decision-making and model predictions. This framework proves that a high predictive accuracy and clinical interpretability can be achieved while having the advantages of being easily deployable in a resource constrained healthcare environment connected to the IIoT, and extendable to future federated learning architectures.
Published
How to Cite
Issue
Section
Copyright (c) 2025 Gouse Baig Mohammad

This work is licensed under a Creative Commons Attribution 4.0 International License.