Medical application of deep-learning-based head pose estimation from RGB image sequence.

Journal: Computers in biology and medicine
Published Date:

Abstract

Recently, telemedicine has allowed doctor-to-patient or doctor-to-doctor consultations to tackle traditional problems: the COVID-19 pandemic, remote areas, long-time usage per visit, and dependence on family members in transportation. Nevertheless, few studies have applied telemedicine to measure head movement, which is mandatory for activities of daily living and is degraded by aging, trauma, pain, and degenerative disease. In recent years, artificial intelligence, including vision-based methods, has been used to measure cervical range of motion (CROM). However, they suffer from significant measurement errors and depth-camera requirements. Conversely, recent deep-learning-based head pose estimation (HPE) networks have achieved higher accuracy than previous methods, which are attractive for CROM measurements in telemedicine. This study aims to propose the application of a deep neural network adopting multi-level pyramidal feature extraction, a bi-directional Pyramidal Feature Aggregation Structure (PFAS) for feature fusion, a modified Atrous Spatial Pyramid Pooling (ASPP) module for spatial and channel feature enhancement, and a multi-bin classification and regression module, to derive the Euler angles as the head pose parameters. We evaluated the proposed technique on public datasets (300 W_LP, AFLW2000, and BIWI), achieving comparable performance to previous algorithms with mean MAE (mean absolute error) values of 3.36°, 3.50°, and 2.16° at several evaluation protocols. For CROM measurement in telemedicine, ours achieved the lowest mean MAE of 3.73° for a private medical dataset. Furthermore, ours achieved fast inference speed of 2.27 ms per image. Thus, for both traditional HPE problems and CROM measurement applications, ours offers accuracy, convenience, low computational requirements, and low operational costs (GitHub: https://github.com/nickuntitled/pyramid_based_HPE).

Authors

  • Kittisak Chotikkakamthorn
    Department of Biomedical Engineering, Faculty of Engineering, Mahidol University, 999 Phutthamonthon 4 Road, Salaya, Nakhon Pathom, 73170, Thailand; Department of Electrical Engineering, College of Engineering, National Chung Cheng University, No. 168, Section 1, University Rd, Minxiong Township, Chia-Yi, 621301, Taiwan.
  • Wen-Nung Lie
    Department of Electrical Engineering, National Chung Cheng University, Chiayi, Taiwan.
  • Panrasee Ritthipravat
    Department of Biomedical Engineering, Faculty of Engineering, Mahidol University, 999 Phuttamonthon 4 Road, Salaya, 73170, Nakhon Pathom, Thailand. panrasee.rit@mahidol.ac.th.
  • Worapan Kusakunniran
    Faculty of Information and Communication Technology, Mahidol University, 999 Phuttamonthon 4 Road, Salaya, 73170, Nakhon Pathom, Thailand. worapan.kun@mahidol.edu.
  • Pimchanok Tuakta
    Department of Rehabilitation Medicine, Faculty of Medicine Ramathibodi Hospital, Mahidol University, 270 Rama 6 Road, 10400, Bangkok, Thailand.
  • Paitoon Benjapornlert
    Department of Rehabilitation Medicine, Faculty of Medicine Ramathibodi Hospital, Mahidol University, 270 Rama 6 Road, 10400, Bangkok, Thailand.