Coding genomes with gapped pattern graph convolutional network.

Journal: Bioinformatics (Oxford, England)
PMID:

Abstract

MOTIVATION: Genome sequencing technologies reveal a huge amount of genomic sequences. Neural network-based methods can be prime candidates for retrieving insights from these sequences because of their applicability to large and diverse datasets. However, the highly variable lengths of genome sequences severely impair the presentation of sequences as input to the neural network. Genetic variations further complicate tasks that involve sequence comparison or alignment.

Authors

  • Ruo Han Wang
    Department of Computer Science, City University of Hong Kong Shenzhen Research Institute, Shen Zhen, 518063, China.
  • Yen Kaow Ng
    Department of Computer Science, City University of Hong Kong Shenzhen Research Institute, Shen Zhen, 518063, China.
  • Xianglilan Zhang
    State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, China.
  • Jianping Wang
    Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave, Kowloon Tong, Hong Kong, China. jianwang@cityu.edu.hk.
  • Shuai Cheng Li
    Department of Computer Science, City University of Hong Kong, Hong Kong, Hong Kong SAR, China.