Generalized genomic data sharing for differentially private federated learning.

Journal: Journal of biomedical informatics
Published Date:

Abstract

The success behind Machine Learning (ML) methods has largely been attributed to the quality and quantity of the available data which can spread across multiple owners. A Federated Learning (FL) from distributed datasets often provides a reliable solution that provides valuable insight. For a genomic dataset, such data have also proven to be sensitive which requires additional safety mechanisms before any sharing or ML operations. We propose a generalized gene expression data sharing method using a differentially private mechanism. Due to the large number of genes available, the data dimension is also reduced to accommodate smaller privacy budgets as we utilize an exponential mechanism to create a private histogram from numeric expression data. The output histogram can be used in any federated machine learning setting having multiple data owners. The proposed solution was submitted to genomic data security and privacy competition, iDash 2020 where it ranked third among 55 teams. We extend the proposed solution and experimented with two different machine learning algorithms and different settings. The experimental results show that it takes around 8 s to train a model while achieving 0.89 AUC with only a privacy budget of 5. The paper outlined a method to share gene expression data for Federated Learning using a privacy-preserving mechanism. Different experimental settings and recent competition results show the efficacy of the method which can be further extended to other genomic datasets and machine learning algorithms.

Authors

  • Md Momin Al Aziz
    Computer Science, University of Manitoba, 66 Chancellors Circle, Winnipeg R3T 2N2, Manitoba, Canada. Electronic address: azizmma@cs.umanitoba.ca.
  • Md Monowar Anjum
    Computer Science, University of Manitoba, 66 Chancellors Circle, Winnipeg R3T 2N2, Manitoba, Canada.
  • Noman Mohammed
    Department of Computer Science, University of Manitoba, Winnipeg, Manitoba R3T 5V6, Canada.
  • Xiaoqian Jiang
    School of Biomedical Informatics, University of Texas Health, Science Center at Houston, Houston, TX, USA.