Improving random forest predictions in small datasets from two-phase sampling designs.
Journal:
BMC medical informatics and decision making
PMID:
34809631
Abstract
BACKGROUND: While random forests are one of the most successful machine learning methods, it is necessary to optimize their performance for use with datasets resulting from a two-phase sampling design with a small number of cases-a common situation in biomedical studies, which often have rare outcomes and covariates whose measurement is resource-intensive.