Creating interpretable deep learning models to identify species using environmental DNA sequences.

Journal: Scientific reports
Published Date:

Abstract

Monitoring species' presence in an ecosystem is crucial for conservation and understanding habitat diversity, but can be expensive and time consuming. As a result, ecologists have begun using the DNA that animals naturally leave behind in water or soil (called environmental DNA, or eDNA) to identify the species present in an environment. Recent work has shown that when used to identify species, convolutional neural networks (CNNs) can be as much as 150 times faster than ObiTools, a traditional method that does not use deep learning. However, CNNs are black boxes, meaning it is impossible to "fact check" why they predict that a given sequence belongs to a particular species. In this work, we introduce an interpretable, prototype-based CNN using the ProtoPNet framework that surpasses previous accuracy on a challenging eDNA dataset. The network is able to visualize the sequences of bases that are most distinctive for each species in the dataset, and introduces a novel skip connection that improves the interpretability of the original ProtoPNet. Our results show that reducing reliance on the convolutional output increases both interpretability and accuracy.

Authors

  • Samuel Waggoner
    School of Computing and Information Science, University of Maine, Orono, 04469, USA. samuel.waggoner@maine.edu.
  • Jon Donnelly
    From the Departments of Computer Science (J.D., L.M., A.J.B., C.R.) and Electrical and Computer Engineering (C.R.), Duke University, 308 Research Dr, LSRC Building D101, Duke Box 90129, Durham, NC 27708; Department of Radiology and Imaging Services, Emory University, Atlanta, Ga (H.T.); Department of Radiology, Harvard University, Cambridge, Mass (F.S.); and Department of Radiology, Duke University School of Medicine, Durham, NC (J.L.).
  • Rose Gurung
    School of Computing and Information Science, University of Maine, Orono, 04469, USA.
  • Laura Jackson
    University of Kansas School of Medicine, Kansas City, Kansas, U.S.A.
  • Chaofan Chen
    School of Computing and Information Science, University of Maine, Orono, 04469, USA. chaofan.chen@maine.edu.