Machine learning-based models of protein fitness typically learn from either unlabeled, evolutionarily related sequences or variant sequences with experimentally measured labels. For regimes where only limited experimental data are available, recent ...
A principal challenge in the analysis of tissue imaging data is cell segmentation-the task of identifying the precise boundary of every cell in an image. To address this problem we constructed TissueNet, a dataset for training segmentation models tha...
Large single-cell atlases are now routinely generated to serve as references for analysis of smaller-scale studies. Yet learning from reference data is complicated by batch effects between datasets, limited availability of computational resources and...
Drug discovery focused on target proteins has been a successful strategy, but many diseases and biological processes lack obvious targets to enable such approaches. Here, to overcome this challenge, we describe a deep learning-based efficacy predicti...
The scientific ecosystem relies on citation-based metrics that provide only imperfect, inconsistent and easily manipulated measures of research quality. Here we describe DELPHI (Dynamic Early-warning by Learning to Predict High Impact), a framework t...
Integrating large single-cell gene expression, chromatin accessibility and DNA methylation datasets requires general and scalable computational approaches. Here we describe online integrative non-negative matrix factorization (iNMF), an algorithm for...
Modern experimental technologies can assay large numbers of biological sequences, but engineered protein libraries rarely exceed the sequence diversity of natural protein families. Machine learning (ML) models trained directly on experimental data wi...
We present Augur, a method to prioritize the cell types most responsive to biological perturbations in single-cell data. Augur employs a machine-learning framework to quantify the separability of perturbed and unperturbed cells within a high-dimensio...
We have developed a deep generative model, generative tensorial reinforcement learning (GENTRL), for de novo small-molecule design. GENTRL optimizes synthetic feasibility, novelty, and biological activity. We used GENTRL to discover potent inhibitors...