Generative Modeling of RNA Sequence Families with Restricted Boltzmann Machines.

Journal: Methods in molecular biology (Clifton, N.J.)
PMID:

Abstract

In this chapter, we discuss the potential application of Restricted Boltzmann machines (RBM) to model sequence families of structured RNA molecules. RBMs are a simple two-layer machine learning model able to capture intricate sequence dependencies induced by secondary and tertiary structure, as well as mechanisms of structural flexibility, resulting in a model that can be successfully used for the design of allosteric RNA such as riboswitches. They have recently been experimentally validated as generative models for the SAM-I riboswitch aptamer domain sequence family. We introduce RBM mathematically and practically, providing self-contained code examples to download the necessary training sequence data, train the RBM, and sample novel sequences. We present in detail the implementation of algorithms necessary to use RBMs, focusing on applications in biological sequence modeling.

Authors

  • Jorge Fernandez-de-Cossio-Diaz
    Laboratory of Physics of the Ecole Normale Supérieure, CNRS UMR 8023 & PSL Research, Sorbonne Université, Paris, France.