autoBioSeqpy: A Deep Learning Tool for the Classification of Biological Sequences.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Deep learning has proven to be a powerful method with applications in various fields including image, language, and biomedical data. Thanks to the libraries and toolkits such as TensorFlow, PyTorch, and Keras, researchers can use different deep learning architectures and data sets for rapid modeling. However, the available implementations of neural networks using these toolkits are usually designed for a specific research and are difficult to transfer to other work. Here, we present autoBioSeqpy, a tool that uses deep learning for biological sequence classification. The advantage of this tool is its simplicity. Users only need to prepare the input data set and then use a command line interface. Then, autoBioSeqpy automatically executes a series of customizable steps including text reading, parameter initialization, sequence encoding, model loading, training, and evaluation. In addition, the tool provides various ready-to-apply and adapt model templates to improve the usability of these networks. We introduce the application of autoBioSeqpy on three biological sequence problems: the prediction of type III secreted proteins, protein subcellular localization, and CRISPR/Cas9 sgRNA activity. autoBioSeqpy is freely available with examples at https://github.com/jingry/autoBioSeqpy.

Authors

  • Runyu Jing
    College of Cybersecurity, Sichuan University, Chengdu 610065, China.
  • Yizhou Li
  • Li Xue
    HDU-ITMO Joint Institute, Hangzhou Dianzi University, Hangzhou 310018, China.
  • Fengjuan Liu
    School of Geography and Resources, Guizhou Education University, Guiyang 550018, China.
  • Menglong Li
    College of Chemistry, Sichuan University, Chengdu 610064, PR China. Electronic address: liml@scu.edu.cn.
  • Jiesi Luo
    College of Chemistry, Sichuan University, Chengdu 610064, PR China.