Toward clearer recognition and easier usefulness: development of a cross-lingual atherosclerotic cerebrovascular disease ontology.

Journal: Database : the journal of biological databases and curation
PMID:

Abstract

Atherosclerotic cerebrovascular disease could result in a great number of deaths and disabilities. However, it did not acquire enough attention. Less information, statistics, or data on the disease has been revealed. Thus, no systematic concept datasets were released to help clinicians clarify the scope, assist research, and offer maximized value. This study aimed to develop a cross-lingual atherosclerotic cerebrovascular disease ontology; describe the workflow, schema, hierarchical structure, and the highlighted content; design a brand-new rehabilitation ontology; implement the ontology evaluation; and illustrate the application scenarios in real-world scenarios. We implemented nine steps based on the Ontology Development 101 methodologies combined with expert opinions. The ontology included collection and specification of clinical requirements, background investigation and knowledge acquisition, ontology selection and reuse, scope identification, schema definition, concept extraction, concept extension, ontology verification, and ontology evaluation. We evaluated the proposed ontology in the literature classification task. The current ontology included 10 top-level classes, respectively, clinical manifestation, comorbidity, complication, diagnosis, model of atherosclerotic cerebrovascular disease, pathogenesis, prevention, rehabilitation, risk factor, and treatment. There are 1715 concepts in the 11-level ontology, covering 4588 Chinese terms, 6617 English terms, and 972 definitions. The ontology could be applied in real-world scenarios such as information retrieval, new expression discovery, named entity recognition, and knowledge fusion, and the use case proved that it could offer satisfying support to related medical scenarios. The ontology was proven to be useful in text classification tasks, and the weight-F1 score could reach >80% combined with the pretrained model. The proposed ontology provided a clear set of cross-lingual concepts and terms with an explicit hierarchical structure, helping scientific researchers to quickly retrieve relevant medical literature, assisting data scientists to efficiently identify relevant contents in electronic health records, and providing a clear domain framework for academic reference. Database URL: https://bioportal.bioontology.org/ontologies/ACVD_ONTOLOGY.

Authors

  • Hetong Ma
    Institute of Medical Information and Library, Chinese Academy of Medical Sciences/Peking Union Medical College, 3rd Yabao Road, Beijing, 100020, China.
  • Liu Shen
    Intelligent Computing Department, Institute of Medical Information & Library, Chinese Academy of Medical Sciences/Peking Union Medical College, No. 3 Yabao Road, Beijing 100020, China.
  • Jiayang Wang
    College of Materials Science and Engineering, Zhejiang University of Technology, Hangzhou, China.
  • Shilong Wang
    Department of Food Science and Human Nutrition, University of Florida, 572 Newell Dr., Gainesville, FL 32611, United States.
  • Min Wang
    National and Local Joint Engineering Research Center of Ecological Treatment Technology for Urban Water Pollution, Wenzhou University, Wenzhou 325035, China.
  • Meng Wang
    State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150001, China.
  • Zixiao Li
    Beijing Tiantan Hospital, Capital Medical University, Beijing, China. lizixiao2008@hotmail.com.
  • Jiao Li
    CAS Key Laboratory of Tropical Marine Bio-resources and Ecology, South China Sea Institute of Oceanology, Chinese Academy of Sciences Guangzhou 510301 China yinhao@scsio.ac.cn.