Representation transfer for differentially private drug sensitivity prediction.

Journal: Bioinformatics (Oxford, England)
Published Date:

Abstract

MOTIVATION: Human genomic datasets often contain sensitive information that limits use and sharing of the data. In particular, simple anonymization strategies fail to provide sufficient level of protection for genomic data, because the data are inherently identifiable. Differentially private machine learning can help by guaranteeing that the published results do not leak too much information about any individual data point. Recent research has reached promising results on differentially private drug sensitivity prediction using gene expression data. Differentially private learning with genomic data is challenging because it is more difficult to guarantee privacy in high dimensions. Dimensionality reduction can help, but if the dimension reduction mapping is learned from the data, then it needs to be differentially private too, which can carry a significant privacy cost. Furthermore, the selection of any hyperparameters (such as the target dimensionality) needs to also avoid leaking private information.

Authors

  • Teppo Niinimäki
    Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Espoo, Finland.
  • Mikko A Heikkilä
    Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland.
  • Antti Honkela
    Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland. antti.honkela@helsinki.fi.
  • Samuel Kaski
    Department of Computer Science, Helsinki Institute of Information Technology, Aalto University, Helsinki, Finland.