DNA sequences performs as natural language processing by exploiting deep learning algorithm for the identification of N4-methylcytosine.

Journal: Scientific reports

Published Date: Jan 8, 2021

Abstract

N4-methylcytosine is a biochemical alteration of DNA that affects the genetic operations without modifying the DNA nucleotides such as gene expression, genomic imprinting, chromosome stability, and the development of the cell. In the proposed work, a computational model, 4mCNLP-Deep, used the word embedding approach as a vector formulation by exploiting deep learning based CNN algorithm to predict 4mC and non-4mC sites on the C.elegans genome dataset. Diversity of ranges employed for the experimental such as corpus k-mer and k-fold cross-validation to obtain the prevailing capabilities. The 4mCNLP-Deep outperform from the state-of-the-art predictor by achieving the results in five evaluation metrics by following; Accuracy (ACC) as 0.9354, Mathew's correlation coefficient (MCC) as 0.8608, Specificity (Sp) as 0.89.96, Sensitivity (Sn) as 0.9563, and Area under curve (AUC) as 0.9731 by using 3-mer corpus word2vec and 3-fold cross-validation and attained the increment of 1.1%, 0.6%, 0.58%, 0.77%, and 4.89%, respectively. At last, we developed the online webserver http://nsclbio.jbnu.ac.kr/tools/4mCNLP-Deep/ , for the experimental researchers to get the results easily.

Authors

Abdul Wahab
Hilal Tayara

Department of Electronics and Information Engineering, Chonbuk National University, Jeonju 54896, South Korea. Electronic address: hilaltayara@jbnu.ac.kr.
Zhenyu Xuan

Department of Biological Sciences, The University of Texas at Dallas, Richardson, 75080, USA. zhenyu.xuan@utdallas.edu.
Kil To Chong

Division of Electronic Engineering, and Advanced Research Center of Electronics and Information, Chonbuk National University, Jeonju-Si 54896, South Korea. Electronic address: kitchong@jbnu.ac.kr.

Keywords

Base Sequence Computer Simulation Deep Learning Natural Language Processing

External Resources

View on PubMed Access via DOI PubMed (33420191)

DNA sequences performs as natural language processing by exploiting deep learning algorithm for the identification of N4-methylcytosine.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

DNA sequences performs as natural language processing by exploiting deep learning algorithm for the identification of N4-methylcytosine.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals