ADNN-FER: adaptive deep neural networks for real-time facial expression recognition using hybrid attention, compound loss functions, and transformer-enhanced feature fusion

https://doi.org/10.55529/jaimlnn.52.128.137

Authors

  • Rizwan Hameed Computer Science, School of Computing, Gold Campus, Superior University, Lahore, Pakistan.

Keywords:

Neural Networks, Facial Expression Recognition, Efficient Net, Transformer, Attention Mechanism, Affective Computing.

Abstract

Facial Expression Recognition (FER) is a core technology in affective computing and has applications in Human–Computer Interaction, autonomous driving, clinical diagnostics, and adaptive education. However, existing FER models often suffer from occlusion sensitivity, illumination variation, ambiguous subtle expressions, and the challenge of balancing accuracy with real-time inference speed. To address these limitations, this paper proposes a novel hybrid framework, ADNN-FER, which integrates an EfficientNet-B5 convolutional backbone, a Convolutional Block Attention Module (CBAM), and a lightweight six-layer multi-head self-attention Transformer. The framework is trained end-to-end using a compound multi-objective loss function. ADNN-FER is extensively evaluated on the FER2013, RAF-DB, Affect Net, and FERPlus benchmark datasets. The proposed compound loss function combines classification loss, center loss, and intra-class variation loss to address class imbalance, annotation noise, and feature compactness simultaneously. The training pipeline further incorporates seven-stage data augmentation, 68-point facial landmark preprocessing, CLAHE normalization, and Action Unit auxiliary supervision. Model interpretability is analyzed using Grad-CAM and SHAP, while Bayesian optimization is employed for hyper parameter tuning. Ablation studies involving five model variants demonstrate the contribution of each module. Experimental results show that ADNN-FER achieves accuracies of 94.7% on FER2013, 95.1% on RAF-DB, 88.9% on Affect Net, and 92.4% on FERPlus, while operating at 52 FPS on an NVIDIA RTX 3090. Statistical analysis using paired t-tests with Bonferroni correction (p < 0.001) confirms significant improvement over ten competing methods. The proposed framework establishes a strong benchmark for real-time multi-class FER by effectively combining accuracy, efficiency, and interpretability.

Published

2025-11-21

How to Cite

Rizwan Hameed. (2025). ADNN-FER: adaptive deep neural networks for real-time facial expression recognition using hybrid attention, compound loss functions, and transformer-enhanced feature fusion. Journal of Artificial Intelligence,Machine Learning and Neural Network , 5(2), 128–137. https://doi.org/10.55529/jaimlnn.52.128.137