Data Quality Estimation Via Model Performance: Machine Learning as a Validation Tool.

Journal: Studies in health technology and informatics
PMID:

Abstract

In our recent study, the attempt to classify neurosurgical operative reports into routinely used expert-derived classes exhibited an F-score not exceeding 0.74. This study aimed to test how improving the classifier (target variable) affected the short text classification with deep learning on real-world data. We redesigned the target variable based on three strict principles when applicable: pathology, localization, and manipulation type. The deep learning significantly improved with the best result of operative report classification into 13 classes (accuracy = 0.995, F1 = 0.990). Reasonable text classification with machine learning should be a two-way process: the model performance must be ensured by the unambiguous textual representation reflected in corresponding target variables. At the same time, the validity of human-generated codification can be inspected via machine learning.

Authors

  • Gleb Danilov
    Laboratory of Biomedical Informatics and Artificial Intelligence, National Medical Research Center for Neurosurgery named after N.N. Burdenko, Moscow, Russian Federation.
  • Konstantin Kotik
    Laboratory of Biomedical Informatics and Artificial Intelligence, National Medical Research Center for Neurosurgery named after N.N. Burdenko, Moscow, Russian Federation.
  • Michael Shifrin
    Laboratory of Biomedical Informatics and Artificial Intelligence, National Medical Research Center for Neurosurgery named after N.N. Burdenko, Moscow, Russian Federation.
  • Yulia Strunina
    Laboratory of Biomedical Informatics and Artificial Intelligence, National Medical Research Center for Neurosurgery named after N.N. Burdenko, Moscow, Russian Federation.
  • Tatiana Pronkina
    Laboratory of Biomedical Informatics and Artificial Intelligence, National Medical Research Center for Neurosurgery named after N.N. Burdenko, Moscow, Russian Federation.
  • Tatiana Tsukanova
    Laboratory of Biomedical Informatics and Artificial Intelligence, National Medical Research Center for Neurosurgery named after N.N. Burdenko, Moscow, Russian Federation.
  • Vladimir Nepomnyashiy
    Laboratory of Biomedical Informatics and Artificial Intelligence, National Medical Research Center for Neurosurgery named after N.N. Burdenko, Moscow, Russian Federation.
  • Nikolay Konovalov
    Laboratory of Biomedical Informatics and Artificial Intelligence, National Medical Research Center for Neurosurgery named after N.N. Burdenko, Moscow, Russian Federation.
  • Valeriy Danilov
    Kazan State Medical University, Kazan, Russian Federation.
  • Alexander Potapov
    Laboratory of Biomedical Informatics and Artificial Intelligence, National Medical Research Center for Neurosurgery named after N.N. Burdenko, Moscow, Russian Federation.