Evaluation of a structure-based method for ab initio gene detection using deep learning
Journal:
bioRxiv
Published Date:
Jan 1, 2025
Abstract
In this work, a novel method for the detection of exons within genomic DNA sequences was implemented and evaluated. This is a structure-based approach, inspired by recent work from Sharma et al., by which nucleotide sequences are converted into physicochemical profiles based on trinucleotide and tetranucleotide mappings that were estimated via molecular dynamics simulations [8,12]. Three deep learning models with architectures suitable for multidimensional sequence classification were trained on the structural profiles for sequences at the junctions between exons and introns, as well as those sampled from other regions of a human reference genome. The trained models showed promising performance in the evaluation set, but will require a much more sophisticated and robust post-processing approach to achieve compelling exon- or gene-level classification performance in realistic use cases. A tool by Sharma et al. based on the same structural approach, which they named ChemEXIN [12], was also evaluated. The results indicate that the approach taken in this work during the development and application of the models offers a major improvement over ChemEXIN, and may more accurately reflect the potential of the underlying idea.