The arginine dihydrolase pathway (arc operon) provides a metabolic niche by transforming arginine into metabolic byproducts. We investigate the role of the arc operon in probiotic Escherichia coli Nissle 1917 on human gut community assembly and healt...
Optimizing enzymes to function in novel chemical environments is a central goal of synthetic biology, but optimization is often hindered by a rugged fitness landscape and costly experiments. In this work, we present TeleProt, a machine learning (ML) ...
Evolution-based deep generative models represent an exciting direction in understanding and designing proteins. An open question is whether such models can learn specialized functional constraints that control fitness in specific biological contexts....
As words can have multiple meanings that depend on sentence context, genes can have various functions that depend on the surrounding biological system. This pleiotropic nature of gene function is limited by ontologies, which annotate gene functions w...
Pretrained protein sequence language models have been shown to improve the performance of many prediction tasks and are now routinely integrated into bioinformatics tools. However, these models largely rely on the transformer architecture, which scal...
A strategy to obtain the greatest number of best-performing variants with least amount of experimental effort over the vast combinatorial mutational landscape would have enormous utility in boosting resource producibility for protein engineering. Tow...
Effective and precise mammalian transcriptome engineering technologies are needed to accelerate biological discovery and RNA therapeutics. Despite the promise of programmable CRISPR-Cas13 ribonucleases, their utility has been hampered by an incomplet...
Attention-based models trained on protein sequences have demonstrated incredible success at classification and generation tasks relevant for artificial-intelligence-driven protein design. However, we lack a sufficient understanding of how very large-...
We present scTenifoldXct, a semi-supervised computational tool for detecting ligand-receptor (LR)-mediated cell-cell interactions and mapping cellular communication graphs. Our method is based on manifold alignment, using LR pairs as inter-data corre...
A major challenge in the analysis of highly multiplexed imaging data is the assignment of cells to a priori known cell types. Existing approaches typically solve this by clustering cells followed by manual annotation. However, these often require sev...