BULLKpy: An AnnData-Inspired Unified Framework for Comprehensive Bulk OMICs Analysis

Journal: bioRxiv
Published Date:

Abstract

Bulk OMICs data, such as RNA-seq and proteomics, remain foundational in biomedical and cancer research. While single-cell transcriptomics has revolutionized our understanding of cellular heterogeneity, bulk OMICs continue to provide unique advantages for large-scale studies, integrative analyses, and robust biomedical correlations. Yet, analytical workflows for bulk OMICs are frequently fragmented across disparate tools, programming languages, and data formats, in contrast to the single-cell field, which benefits from comprehensive and standardized frameworks like Scanpy and the broader scverse ecosystem. Here, we introduce BULLKpy, a Python-based, scverse-inspired framework designed to deliver similar integration, flexibility, and scalability to bulk RNA-seq analysis, with a particular emphasis on cancer research. Beyond standard preprocessing and visualization utilities, BULLKpy offers systematic evaluation of gene-metadata associations, categorical enrichment analyses, clustering stability metrics, and advanced visualization strategies that facilitate intuitive exploration of tumor heterogeneity in expression profiles and metaprograms. Cancer-specific visualizations, such as oncoprints, are natively supported and seamlessly integrated with transcriptomic and clinical features, allowing users to relate somatic alterations to expression-derived phenotypes. By standardizing workflows within the Python ecosystem and aligning with AnnData objects and the scverse, BULLKpy aims to democratize advanced transcriptomic and proteomic analyses, facilitate integrative cancer research, and support future developments at the intersection of computational biology, multi-omics integration, and artificial intelligence.

Authors

  • Malumbres
  • M.

Categories