Fast and scalable neural embedding models for biomedical sentence classification.

Journal: BMC bioinformatics
Published Date:

Abstract

BACKGROUND: Biomedical literature is expanding rapidly, and tools that help locate information of interest are needed. To this end, a multitude of different approaches for classifying sentences in biomedical publications according to their coarse semantic and rhetoric categories (e.g., Background, Methods, Results, Conclusions) have been devised, with recent state-of-the-art results reported for a complex deep learning model. Recent evidence showed that shallow and wide neural models such as fastText can provide results that are competitive or superior to complex deep learning models while requiring drastically lower training times and having better scalability. We analyze the efficacy of the fastText model in the classification of biomedical sentences in the PubMed 200k RCT benchmark, and introduce a simple pre-processing step that enables the application of fastText on sentence sequences. Furthermore, we explore the utility of two unsupervised pre-training approaches in scenarios where labeled training data are limited.

Authors

  • Asan Agibetov
    Italian National Research Council, Via De Marini 6, Genoa, 16149, Italy.
  • Kathrin Blagec
    Section for Artificial Intelligence and Decision Support, Medical University of Vienna, Währinger Strasse 25A, OG1, Vienna, 1090, Austria.
  • Hong Xu
    Department of Neurosurgery, Changshu Hospital Affiliated to Soochow University, Changshu, China.
  • Matthias Samwald
    Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Spitalgasse 23, 1090, Vienna, Austria. matthias.samwald@meduniwien.ac.at.