A Goemans-Williamson type algorithm for identifying subcohorts in clinical trials
Journal:
arXiv
Published Date:
Jun 12, 2025
Abstract
We design an efficient algorithm that outputs a linear classifier for
identifying homogeneous subsets (equivalently subcohorts) from large
inhomogeneous datasets. Our theoretical contribution is a rounding technique,
similar to that of Goemans and Williamson (1994), that approximates the optimal
solution of the underlying optimization problem within a factor of $0.82$. As
an application, we use our algorithm to design a simple test that can identify
homogeneous subcohorts of patients, that are mainly comprised of metastatic
cases, from the RNA microarray dataset for breast cancer by Curtis et al.
(2012). Furthermore, we also use the test output by the algorithm to
systematically identify subcohorts of patients in which statistically
significant changes in methylation levels of tumor suppressor genes co-occur
with statistically significant changes in nuclear receptor expression.
Identifying such homogeneous subcohorts of patients can be useful for the
discovery of disease pathways and therapeutics, specific to the subcohort.