DeepBarcoding: Deep Learning for Species Classification Using DNA Barcoding.

Journal: IEEE/ACM transactions on computational biology and bioinformatics
Published Date:

Abstract

DNA barcodes with short sequence fragments are used for species identification. Because of advances in sequencing technologies, DNA barcodes have gradually been emphasized. DNA sequences from different organisms are easily and rapidly acquired. Therefore, DNA sequence analysis tools play an increasingly crucial role in species identification. This study proposed deep barcoding, a deep learning framework for species classification by using DNA barcodes. Deep barcoding uses raw sequence data as the input to represent one-hot encoding as a one-dimensional image and uses a deep convolutional neural network with a fully connected deep neural network for sequence analysis. It can achieve an average accuracy of >90 percent for both simulation and real datasets. Although deep learning yields outstanding performance for species classification with DNA sequences, its application remains a challenge. The deep barcoding model can be a potential tool for species classification and can elucidate DNA barcode-based species identification.

Authors

  • Cheng-Hong Yang
    Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung, Taiwan. chyang@cc.kuas.edu.tw.
  • Kuo-Chuan Wu
  • Li-Yeh Chuang
    Department of Chemical Engineering & Institute of Biotechnology and Chemical Engineering, I-Shou University, Kaohsiung, Taiwan. chuang@isu.edu.tw.
  • Hsueh-Wei Chang
    Cancer Center, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung 80708, Taiwan, R.O.C.