A Multimodal Pain Sentiment Analysis System Using Ensembled Deep Learning Approaches for IoT-Enabled Healthcare Framework.
Journal:
Sensors (Basel, Switzerland)
PMID:
40006452
Abstract
This study introduces a multimodal sentiment analysis system to assess and recognize human pain sentiments within an Internet of Things (IoT)-enabled healthcare framework. This system integrates facial expressions and speech-audio recordings to evaluate human pain intensity levels. This integration aims to enhance the recognition system's performance and enable a more accurate assessment of pain intensity. Such a multimodal approach supports improved decision making in real-time patient care, addressing limitations inherent in unimodal systems for measuring pain sentiment. So, the primary contribution of this work lies in developing a multimodal pain sentiment analysis system that integrates the outcomes of image-based and audio-based pain sentiment analysis models. The system implementation contains five key phases. The first phase focuses on detecting the facial region from a video sequence, a crucial step for extracting facial patterns indicative of pain. In the second phase, the system extracts discriminant and divergent features from the facial region using deep learning techniques, utilizing some convolutional neural network (CNN) architectures, which are further refined through transfer learning and fine-tuning of parameters, alongside fusion techniques aimed at optimizing the model's performance. The third phase performs the speech-audio recording preprocessing; the extraction of significant features is then performed through conventional methods followed by using the deep learning model to generate divergent features to recognize audio-based pain sentiments in the fourth phase. The final phase combines the outcomes from both image-based and audio-based pain sentiment analysis systems, improving the overall performance of the multimodal system. This fusion enables the system to accurately predict pain levels, including 'high pain', 'mild pain', and 'no pain'. The performance of the proposed system is tested with the three image-based databases such as a 2D Face Set Database with Pain Expression, the UNBC-McMaster database (based on shoulder pain), and the BioVid database (based on heat pain), along with the VIVAE database for the audio-based dataset. Extensive experiments were performed using these datasets. Finally, the proposed system achieved accuracies of 76.23%, 84.27%, and 38.04% for two, three, and five pain classes, respectively, on the 2D Face Set Database with Pain Expression, UNBC, and BioVid datasets. The VIVAE audio-based system recorded a peak performance of 97.56% and 98.32% accuracy for varying training-testing protocols. These performances were compared with some state-of-the-art methods that show the superiority of the proposed system. By combining the outputs of both deep learning frameworks on image and audio datasets, the proposed multimodal pain sentiment analysis system achieves accuracies of 99.31% for the two-class, 99.54% for the three-class, and 87.41% for the five-class pain problems.