A machine learning-based framework to identify type 2 diabetes through electronic health records.

Journal: International journal of medical informatics
Published Date:

Abstract

OBJECTIVE: To discover diverse genotype-phenotype associations affiliated with Type 2 Diabetes Mellitus (T2DM) via genome-wide association study (GWAS) and phenome-wide association study (PheWAS), more cases (T2DM subjects) and controls (subjects without T2DM) are required to be identified (e.g., via Electronic Health Records (EHR)). However, existing expert based identification algorithms often suffer in a low recall rate and could miss a large number of valuable samples under conservative filtering standards. The goal of this work is to develop a semi-automated framework based on machine learning as a pilot study to liberalize filtering criteria to improve recall rate with a keeping of low false positive rate.

Authors

  • Tao Zheng
    Guangzhou Institute of Energy Conversion, Chinese Academy of Sciences, Guangzhou 510640, People's Republic of China; Key Laboratory of Renewable Energy, Chinese Academy of Sciences, Guangzhou 510640, People's Republic of China. Electronic address: zhengtao@ms.giec.ac.cn.
  • Wei Xie
    Department of Electrical Engineering & Computer Science, Vanderbilt University, Nashville, TN 37232, United States of America.
  • Liling Xu
    Tongren Hospital Shanghai Jiao Tong University, Shanghai, China.
  • Xiaoying He
    Department of Endocrinology, the First Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China.
  • Ya Zhang
    Department of Plant Protection, College of Plant Protection, Hunan Agricultural University, Changsha, China. Electronic address: zhangya230@126.com.
  • Mingrong You
    Division of Epidemiology, Vanderbilt University, Nashville, TN, USA.
  • Gong Yang
    Division of Epidemiology, Vanderbilt University, Nashville, TN, USA.
  • You Chen
    Dept. of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, USA.