Vec2image: an explainable artificial intelligence model for the feature representation and classification of high-dimensional biological data by vector-to-image conversion.

Journal: Briefings in bioinformatics
Published Date:

Abstract

Feature representation and discriminative learning are proven models and technologies in artificial intelligence fields; however, major challenges for machine learning on large biological datasets are learning an effective model with mechanistical explanation on the model determination and prediction. To satisfy such demands, we developed Vec2image, an explainable convolutional neural network framework for characterizing the feature engineering, feature selection and classifier training that is mainly based on the collaboration of principal component coordinate conversion, deep residual neural networks and embedded k-nearest neighbor representation on pseudo images of high-dimensional biological data, where the pseudo images represent feature measurements and feature associations simultaneously. Vec2image has achieved better performance compared with other popular methods and illustrated its efficiency on feature selection in cell marker identification from tissue-specific single-cell datasets. In particular, in a case study on type 2 diabetes (T2D) by multiple human islet scRNA-seq datasets, Vec2image first displayed robust performance on T2D classification model building across different datasets, then a specific Vec2image model was trained to accurately recognize the cell state and efficiently rank feature genes relevant to T2D which uncovered potential T2D cellular pathogenesis; and next the cell activity changes, cell composition imbalances and cell-cell communication dysfunctions were associated to our finding T2D feature genes from both population-shared and individual-specific perspectives. Collectively, Vec2image is a new and efficient explainable artificial intelligence methodology that can be widely applied in human-readable classification and prediction on the basis of pseudo image representation of biological deep sequencing data.

Authors

  • Hui Tang
    Department of Pharmacy, The Affiliated Hospital of Southwest Medical University, Luzhou, China.
  • Xiangtian Yu
    Clinical Reasearch Center, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China.
  • Rui Liu
    School of Education, China West Normal University, Nanchong, Sichuan, China.
  • Tao Zeng
    Department of Urology, Second Affiliated Hospital of Nanchang University, Nanchang, China.