DNA Privacy: Analyzing Malicious DNA Sequences Using Deep Neural Networks.

Journal: IEEE/ACM transactions on computational biology and bioinformatics
Published Date:

Abstract

Recent advances in next-generation sequencing technologies have led to the successful insertion of video information into DNA using synthesized oligonucleotides. Several attempts have been made to embed larger data into living organisms. This process of embedding messages is called steganography and it is used for hiding and watermarking data to protect intellectual property. In contrast, steganalysis is a group of algorithms that serves to detect hidden information from covert media. Various methods have been developed to detect messages embedded in conventional covert channels. However, conventional steganalysis algorithms are mostly limited to common covert media. Most common detection approaches, such as frequency analysis-based methods, often overlook important signals when directly applied to DNA steganography and are easily bypassed by recently developed steganography techniques. To address the limitations of conventional approaches, a sequence-learning-based malicious DNA sequence analysis method based on neural networks has been proposed. The proposed method learns intrinsic distributions and identifies distribution variations using a classification score to predict whether a sequence is to be a coding or non-coding sequence. Based on our experiments and results, we have developed a framework to safeguard security against DNA steganography.

Authors

  • Ho Bae
    Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea.
  • Seonwoo Min
  • Hyun-Soo Choi
    Department of Electrical and Computer Engineering, Seoul National University, room 908 Bldg. 301, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, Korea.
  • Sungroh Yoon
    4 Department of Electrical and Computer Engineering and Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea.