Hybrid gradient boosting with SMOTE-augmented feature engineering for high-accuracy cardiac arrhythmia detection: a comparative supervised machine learning study
Keywords:
Cardiac Arrhythmia Detection, ECG Classification, Supervised Machine Learning, Gradient Boosting, Feature Engineering, Hyperparameter Optimization.Abstract
Background: Cardiac arrhythmias are a significant problem in the world and are the cause of around 15-20% of sudden cardiac deaths each year. Electrocardiogram (ECG) signal automated detection at the right time and place is still a major challenge in clinical practice because of signal complexity, inter-patient variation and significant class imbalance in clinical data sets. Objective: This study seeks to propose and test a supervised machine learning pipeline for the automated binary classification of cardiac arrhythmias based on multi-dimensional features extracted from the ECG, which involves gradient boosting classification, data augmentation using SMOTE, feature selection using SelectKBest and systematic hyper parameter optimization using 5-fold stratified cross-validated grid search. Methods: A total of 2,000 ECG samples (970 normal and 1,030 arrhythmic) were collected, pre-processed by Z-score normalization and mean imputation, and then selected the top 12 features from 20 candidate features using chi-squared feature selection. To deal with class imbalance, SMOTE was only employed on the training partition. 6 classifiers (Gradient Boosting, Random Forest, Support Vector Machine, Decision Tree, K-Nearest Neighbors, and Logistic Regression) were trained, tuned and benchmarked using the same experimental conditions. Results: The proposed Gradient Boosting model attained a classification accuracy of 95.8%, a precision score of 96.1%, a recall score of 95.4%, F1-Score of 95.7% and AUC-ROC of 0.989, which is an improvement of 1.6–11.6 percentage points compared to the other baselines. The ablation experiments showed that each of the pipeline stages was indeed a significant contributor to the overall performance and that the combination of SMOTE and hyper parameter optimization resulted in a 5.3% F1-gain compared to the baseline configuration. Conclusion: The proposed ECG arrhythmia detection framework shows competitive performance with recent state-of-the-art ECG classifiers and offers an interpretable and computational efficient method for clinically deployable arrhythmia detection. The pipeline is generalizable to other bio-signal classification applications, and is fully reproducible using open-source code.
Published
How to Cite
Issue
Section
Copyright (c) 2026 Dr. Vaibhav Bhushan Tyagi

This work is licensed under a Creative Commons Attribution 4.0 International License.