Semi-automated Conversion of Clinical Trial Legacy Data into CDISC SDTM Standards Format Using Supervised Machine Learning.

Journal: Methods of information in medicine
Published Date:

Abstract

OBJECTIVE:  This study aimed to develop a semi-automated process to convert legacy data into clinical data interchange standards consortium (CDISC) study data tabulation model (SDTM) format by combining human verification and three methods: data normalization; feature extraction by distributed representation of dataset names, variable names, and variable labels; and supervised machine learning.

Authors

  • Takuma Oda
    Division of Biostatistics, Tohoku University Graduate School of Medicine, Sendai-city, Miyagi Prefecture, Japan.
  • Shih-Wei Chiu
    Division of Biostatistics, Tohoku University Graduate School of Medicine, Sendai-city, Miyagi Prefecture, Japan.
  • Takuhiro Yamaguchi
    Division of Biostatistics, Tohoku University Graduate School of Medicine, Sendai-city, Miyagi Prefecture, Japan.