A Machine Learning Pipeline to Screen Large In Vivo Molecular Data to Curate Disease Signatures of High Translational Potential.
Journal:
Methods in molecular biology (Clifton, N.J.)
PMID:
39900768
Abstract
A significantly low success rate of human clinical studies has long been attributed to a capability gap, namely, an ineffective translation of the animal data to the human context. To bridge this capability gap, several correcting measures have been evaluated; using a strict guideline to select animal models for a given disease and implementing alternative models such as tissues-on-chip are some of them. Current hypothesis tells that there is a basic similarity in responding to a stress between human and those mammals that precede human in the phylogenetic tree; however, the corresponding molecular mechanisms are not exactly the same across these species. Therefore, strategic manipulations are necessary to curate those candidates from animal data that would have high translational potential. Hence, we developed an analytical tool that can screen the in vivo results, such as genomic, proteomic, epigenomic data with two primary objectives. The first objective is to identify those molecules that are sequentially conserved across the phylogenetic tree. The second objective is to find those molecules that would similarly perturb across the phylogenetic tree in responding to a stress of interest. A machine learning (ML) algorithm converges these two sets of molecules to curate the common features, which would demonstrate phylogenetic homology in their molecular makeups and characteristic similarity across the phylogenetic tree. This ML-pipeline would be most beneficial in those scenarios, such as the rare diseases or chemical-biological-radiation-nuclear (CBRN)-exposed samples, where the inventory of human samples is minimum. This strategy is surely at a risk in overlooking the human-exclusive signatures; nevertheless, this ML-approach is poised to refine the animal data to generate results of high translational potential with minimum false positive and false negative entries.