inMOTIFin: a lightweight end-to-end simulation software for regulatory sequences
Journal:
arXiv
Published Date:
Jun 25, 2025
Abstract
The accurate development, assessment, interpretation, and benchmarking of
bioinformatics frameworks for analyzing transcriptional regulatory grammars
rely on controlled simulations to validate the underlying methods. However,
existing simulators often lack end-to-end flexibility or ease of integration,
which limits their practical use. We present inMOTIFin, a lightweight, modular,
and user-friendly Python-based software that addresses these gaps by providing
versatile and efficient simulation and modification of DNA regulatory
sequences. inMOTIFin enables users to simulate or modify regulatory sequences
efficiently for the customizable generation of motifs and insertion of motif
instances with precise control over their positions, co-occurrences, and
spacing, as well as direct modification of real sequences, facilitating a
comprehensive evaluation of motif-based methods and interpretation tools. We
demonstrate inMOTIFin applications for the assessment of de novo motif
discovery prediction, the analysis of transcription factor cooperativity, and
the support of explainability analyses for deep learning models. inMOTIFin
ensures robust and reproducible analyses for studying transcriptional
regulatory grammars.
inMOTIFin is available at PyPI https://pypi.org/project/inMOTIFin/ and Docker
Hub https://hub.docker.com/r/cbgr/inmotifin. Detailed documentation is
available at https://inmotifin.readthedocs.io/en/latest/. The code for use case
analyses is available at
https://bitbucket.org/CBGR/inmotifin_evaluation/src/main/.