HCS-hierarchical algorithm for simulation of omics datasets.
Journal:
Bioinformatics (Oxford, England)
Published Date:
Sep 1, 2024
Abstract
MOTIVATION: Analysis of the omics data with the help of machine learning (ML) methods is limited by small sample sizes and a large number of variables. One possible approach to deal with such data is using algorithms for feature selection and reducing the dataset to include only those variables that are related to the studied phenomena. Existing simulators of the omics data were mostly developed with the goal of improving the methods for generations of high-quality data, that correspond with the highest possible fidelity to the real level of molecular markers in the biological materials. The current study aims to simulate the data on a higher level of generalization. Such datasets can then be used to perform tests of the feature selection and ML algorithms on systems that have structures mimicking those of real data, but where the ground truth may be implanted by design. They can also be used to generate contrast variables with the desired correlation structure for the feature selection.