CellExLink: End-to-end cell-type recognition and normalization in biomedical text

Journal: bioRxiv
Published Date:

Abstract

Cell-type extraction is an important task in biomedical text mining because biomedical literature contains evidence about cell types and cell-type-related biological interactions that supports studies of disease mechanisms, therapeutic response, and translational biomedical modeling. However, current biomedical text-mining systems either do not explicitly support cell-type extraction, provide limited support for Cell Ontology normalization, or achieve limited accuracy for end-to-end cell-type extraction. These limitations can affect downstream tasks that depend on reliable cell-type information. Here, we present CellExLink, an end-to-end biomedical natural language processing pipeline designed specifically for cell-type recognition and Cell Ontology normalization in biomedical text. The pipeline is designed to improve extraction accuracy and practical usability in literature-mining workflows, while accounting for computational efficiency in its recognition and normalization design. We evaluate CellExLink across heterogeneous biomedical corpora and compare it with established and recent biomedical text-mining tools. The results show that CellExLink provides reliable cell-type recognition, Cell Ontology normalization, and end-to-end extraction across these corpora. By addressing the need for reliable end-to-end cell-type recognition and Cell Ontology normalization, CellExLink can support downstream tasks such as curation, search, relation extraction, and knowledge graph construction.

Authors

  • Nabijiang
  • A.; Shahriyari
  • L.

Categories