Improving random forest predictions in small datasets from two-phase sampling designs.

Journal: BMC medical informatics and decision making
PMID:

Abstract

BACKGROUND: While random forests are one of the most successful machine learning methods, it is necessary to optimize their performance for use with datasets resulting from a two-phase sampling design with a small number of cases-a common situation in biomedical studies, which often have rare outcomes and covariates whose measurement is resource-intensive.

Authors

  • Sunwoo Han
    Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, USA.
  • Brian D Williamson
    Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA.
  • Youyi Fong
    Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, USA. youyifong@gmail.com.