The Classification of Short Scientific Texts Using Pretrained BERT Model.

Journal: Studies in health technology and informatics

Published Date: May 27, 2021

Abstract

Automated text classification is a natural language processing (NLP) technology that could significantly facilitate scientific literature selection. A specific topical dataset of 630 article abstracts was obtained from the PubMed database. We proposed 27 parametrized options of PubMedBERT model and 4 ensemble models to solve a binary classification task on that dataset. Three hundred tests with resamples were performed in each classification approach. The best PubMedBERT model demonstrated F1-score = 0.857 while the best ensemble model reached F1-score = 0.853. We concluded that the short scientific texts classification quality might be improved using the latest state-of-art approaches.

Authors

Gleb Danilov

Laboratory of Biomedical Informatics and Artificial Intelligence, National Medical Research Center for Neurosurgery named after N.N. Burdenko, Moscow, Russian Federation.
Timur Ishankulov

Laboratory of Biomedical Informatics and Artificial Intelligence, National Medical Research Center for Neurosurgery named after N.N. Burdenko, Moscow, Russian Federation.
Konstantin Kotik

Laboratory of Biomedical Informatics and Artificial Intelligence, National Medical Research Center for Neurosurgery named after N.N. Burdenko, Moscow, Russian Federation.
Yuriy Orlov

Keldysh Institute of Applied Mathematics, Russian Academy of Sciences, Moscow, Russian Federation.
Mikhail Shifrin

Laboratory of Biomedical Informatics and Artificial Intelligence, National Medical Research Center for Neurosurgery named after N.N. Burdenko, Moscow, Russian Federation.
Alexander Potapov

Laboratory of Biomedical Informatics and Artificial Intelligence, National Medical Research Center for Neurosurgery named after N.N. Burdenko, Moscow, Russian Federation.

Keywords

Natural Language Processing PubMed

External Resources

View on PubMed Access via DOI PubMed (34042710)

The Classification of Short Scientific Texts Using Pretrained BERT Model.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals