uHAF: a unified hierarchical annotation framework for cell type standardization and harmonization.

Journal: Bioinformatics (Oxford, England)
PMID:

Abstract

SUMMARY: In single-cell transcriptomics, inconsistent cell type annotations due to varied naming conventions and hierarchical granularity impede data integration, machine learning applications, and meaningful evaluations. To address this challenge, we developed the unified Hierarchical Annotation Framework (uHAF), which includes organ-specific hierarchical cell type trees (uHAF-T) and a mapping tool (uHAF-Agent) based on large language models. uHAF-T provides standardized hierarchical references for 38 organs, allowing for consistent label unification and analysis at different levels of granularity. uHAF-Agent leverages GPT-4 to accurately map diverse and informal cell type labels onto uHAF-T nodes, streamlining the harmonization process. By simplifying label unification, uHAF enhances data integration, supports machine learning applications, and enables biologically meaningful evaluations of annotation methods. Our framework serves as an essential resource for standardizing cell type annotations and fostering collaborative refinement in the single-cell research community.

Authors

  • Haiyang Bian
    MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China.
  • Yinxin Chen
    MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China.
  • Lei Wei
    MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China.
  • Xuegong Zhang
    MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China.