The Classification of Scientific Literature for Its Topical Tracking on a Small Human-Prepared Dataset.

Journal: Studies in health technology and informatics
Published Date:

Abstract

The number of scientific publications is constantly growing to make their processing extremely time-consuming. We hypothesized that a user-defined literature tracking may be augmented by machine learning on article summaries. A specific dataset of 671 article abstracts was obtained and nineteen binary classification options using machine learning (ML) techniques on various text representations were proposed in a pilot study. 300 tests with resamples were performed for each classification option. The best classification option demonstrated AUC = 0.78 proving the concept in general and indicating a potential for solution improvement.

Authors

  • Gleb Danilov
    Laboratory of Biomedical Informatics and Artificial Intelligence, National Medical Research Center for Neurosurgery named after N.N. Burdenko, Moscow, Russian Federation.
  • Timur Ishankulov
    Laboratory of Biomedical Informatics and Artificial Intelligence, National Medical Research Center for Neurosurgery named after N.N. Burdenko, Moscow, Russian Federation.
  • Yuriy Orlov
    Keldysh Institute of Applied Mathematics, Russian Academy of Sciences, Moscow, Russian Federation.
  • Mikhail Shifrin
    Laboratory of Biomedical Informatics and Artificial Intelligence, National Medical Research Center for Neurosurgery named after N.N. Burdenko, Moscow, Russian Federation.
  • Konstantin Kotik
    Laboratory of Biomedical Informatics and Artificial Intelligence, National Medical Research Center for Neurosurgery named after N.N. Burdenko, Moscow, Russian Federation.
  • Alexander Potapov
    Laboratory of Biomedical Informatics and Artificial Intelligence, National Medical Research Center for Neurosurgery named after N.N. Burdenko, Moscow, Russian Federation.