Creating interpretable deep learning models to identify species using environmental DNA sequences.

Journal: Scientific reports

Published Date: Jul 28, 2025

Abstract

Monitoring species' presence in an ecosystem is crucial for conservation and understanding habitat diversity, but can be expensive and time consuming. As a result, ecologists have begun using the DNA that animals naturally leave behind in water or soil (called environmental DNA, or eDNA) to identify the species present in an environment. Recent work has shown that when used to identify species, convolutional neural networks (CNNs) can be as much as 150 times faster than ObiTools, a traditional method that does not use deep learning. However, CNNs are black boxes, meaning it is impossible to "fact check" why they predict that a given sequence belongs to a particular species. In this work, we introduce an interpretable, prototype-based CNN using the ProtoPNet framework that surpasses previous accuracy on a challenging eDNA dataset. The network is able to visualize the sequences of bases that are most distinctive for each species in the dataset, and introduces a novel skip connection that improves the interpretability of the original ProtoPNet. Our results show that reducing reliance on the convolutional output increases both interpretability and accuracy.

Authors

Samuel Waggoner

School of Computing and Information Science, University of Maine, Orono, 04469, USA. samuel.waggoner@maine.edu.
Jon Donnelly

From the Departments of Computer Science (J.D., L.M., A.J.B., C.R.) and Electrical and Computer Engineering (C.R.), Duke University, 308 Research Dr, LSRC Building D101, Duke Box 90129, Durham, NC 27708; Department of Radiology and Imaging Services, Emory University, Atlanta, Ga (H.T.); Department of Radiology, Harvard University, Cambridge, Mass (F.S.); and Department of Radiology, Duke University School of Medicine, Durham, NC (J.L.).
Rose Gurung

School of Computing and Information Science, University of Maine, Orono, 04469, USA.
Laura Jackson

University of Kansas School of Medicine, Kansas City, Kansas, U.S.A.
Chaofan Chen

School of Computing and Information Science, University of Maine, Orono, 04469, USA. chaofan.chen@maine.edu.

Keywords

Animals Deep Learning DNA, Environmental Ecosystem Neural Networks, Computer Sequence Analysis, DNA

External Resources

View on PubMed Access via DOI PubMed (40721613)

Creating interpretable deep learning models to identify species using environmental DNA sequences.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Creating interpretable deep learning models to identify species using environmental DNA sequences.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals