Multiple Linked Tensor Factorization
Journal:
arXiv
Published Date:
Feb 27, 2025
Abstract
In biomedical research and other fields, it is now common to generate high
content data that are both multi-source and multi-way. Multi-source data are
collected from different high-throughput technologies while multi-way data are
collected over multiple dimensions, yielding multiple tensor arrays.
Integrative analysis of these data sets is needed, e.g., to capture and
synthesize different facets of complex biological systems. However, despite
growing interest in multi-source and multi-way factorization techniques,
methods that can handle data that are both multi-source and multi-way are
limited. In this work, we propose a Multiple Linked Tensors Factorization
(MULTIFAC) method extending the CANDECOMP/PARAFAC (CP) decomposition to
simultaneously reduce the dimension of multiple multi-way arrays and
approximate underlying signal. We first introduce a version of the CP
factorization with L2 penalties on the latent factors, leading to rank
sparsity. When extended to multiple linked tensors, the method automatically
reveals latent components that are shared across data sources or individual to
each data source. We also extend the decomposition algorithm to its
expectation-maximization (EM) version to handle incomplete data with
imputation. Extensive simulation studies are conducted to demonstrate
MULTIFAC's ability to (i) approximate underlying signal, (ii) identify shared
and unshared structures, and (iii) impute missing data. The approach yields an
interpretable decomposition on multi-way multi-omics data for a study on
early-life iron deficiency.