Assessing the Performance of Machine Learning Methods Trained on Public Health Observational Data: A Case Study From COVID-19.

Journal: Statistics in medicine
Published Date:

Abstract

From early in the coronavirus disease 2019 (COVID-19) pandemic, there was interest in using machine learning methods to predict COVID-19 infection status based on vocal audio signals, for example, cough recordings. However, early studies had limitations in terms of data collection and of how the performances of the proposed predictive models were assessed. This article describes how these limitations have been overcome in a study carried out by the Turing-RSS Health Data Laboratory and the UK Health Security Agency. As part of the study, the UK Health Security Agency collected a dataset of acoustic recordings, SARS-CoV-2 infection status and extensive study participant meta-data. This allowed us to rigorously assess state-of-the-art machine learning techniques to predict SARS-CoV-2 infection status based on vocal audio signals. The lessons learned from this project should inform future studies on statistical evaluation methods to assess the performance of machine learning techniques for public health tasks.

Authors

  • Davide Pigoli
    Department of Mathematics, King's College London, UK.
  • Kieran Baker
    Department of Mathematics, King's College London, UK.
  • Jobie Budd
    London Centre for Nanotechnology, University College London, London, UK.
  • Lorraine Butler
    UK Health Security Agency, London, UK.
  • Harry Coppock
    The Alan Turing Institute, London, UK.
  • Sabrina Egglestone
    UK Health Security Agency, London, UK.
  • Steven G Gilmour
    Department of Mathematics, King's College London, UK.
  • Chris Holmes
    Department of Statistics, University of Oxford, Oxford, UK.
  • David Hurley
    UK Health Security Agency, London, UK.
  • Radka Jersakova
    The Alan Turing Institute, London, UK.
  • Ivan Kiskin
    Centre for Vision, Speech and Signal Processing, University of Surrey, UK.
  • Vasiliki Koutra
    Department of Mathematics, King's College London, UK.
  • Jonathon Mellor
    UK Health Security Agency, London, UK.
  • George Nicholson
    The Alan Turing Institute, London, UK.
  • Joe Packham
    UK Health Security Agency, London, UK.
  • Selina Patel
    Division of Medicine, University College London, UK.
  • Richard Payne
    UK Health Security Agency, London, UK.
  • Stephen J Roberts
    Department of Engineering Science, University of Oxford, UK.
  • Björn W Schuller
    GLAM - the Group on Language, Audio, & Music, Imperial College London, London, United Kingdom.
  • Ana Tendero-Cañadas
    UK Health Security Agency, London, UK.
  • Tracey Thornley
    Pharmacy Practice and Policy Division, University of Nottingham, UK.
  • Alexander Titcomb
    UK Health Security Agency, London, UK.