Protocol to identify functional doppelgängers and verify biomedical gene expression data using doppelgangerIdentifier.

Journal: STAR protocols
Published Date:

Abstract

Functional doppelgängers (FDs) are independently derived sample pairs that confound machine learning model (ML) performance when assorted across training and validation sets. Here, we detail the use of doppelgangerIdentifier (DI), providing software installation, data preparation, doppelgänger identification, and functional testing steps. We demonstrate examples with biomedical gene expression data. We also provide guidelines for the selection of user-defined function arguments. For complete details on the use and execution of this protocol, please refer to Wang et al. (2022).

Authors

  • Li Rong Wang
    School of Computer Science and Engineering, Nanyang Technological University, Singapore.
  • Xiuyi Fan
    School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore.
  • Wilson Wen Bin Goh
    School of Biological Sciences, Nanyang Technological University, Singapore 637551, Republic of Singapore. Electronic address: wilsongoh@ntu.edu.sg.