TriCvT-DTI: Predicting Drug-Target Interactions Using Trimodal Representations and Convolutional Vision Transformers.
Journal:
IEEE journal of biomedical and health informatics
Published Date:
Jun 1, 2025
Abstract
Predicting interactions between drugs and their targets is vital for drug discovery and repositioning. Conventional techniques are slow and labor-intensive, while deep learning algorithms offer efficient solutions. However, deep learning often focus on single drug representations or simplistic combinations, leading to suboptimal feature representation. Moreover, the prevalent use of convolutional neural networks (CNNs) in drug image representation neglects the necessity for both local and global drug information in Drug-Target Interaction (DTI) tasks. To address these challenges, we propose TriCvT-DTI, a novel approach that combines molecular images, chemical sequence features, and graph representations of drugs to comprehensively capture structural, spatial, and functional aspects. TriCvT-DTI introduces a bidirectional multi-head attention mechanism for interactive feature learning between drugs and targets, enhancing performance by modeling complex relationships. By using Convolutional Vision Transformers (CvTs), TriCvT-DTI can effectively extract structural and spatial features from drug images. We evaluate our model on three datasets: Human, C. elegans, and Davis, and we compare it with state-of-the-art methods. Then we train TriCvT-DTI with uni-modality and bi-modality to compare then extract the impact of each modality on TriCvT-DTI. Experimental results demonstrate that TriCvT-DTI outperforms existing methods on both balanced and unbalanced datasets. Moreover, it presents impressive generalization capabilities on the Drug-Target Interaction (DTI) task.