Synthetic health data in Canada: A scoping review of methods, applications, and data sources.

Journal: Canadian journal of public health = Revue canadienne de sante publique
Published Date:

Abstract

BACKGROUND: Access to provincial health-related data for multi-jurisdictional studies in Canada is restricted by privacy laws. Synthetic data (SD), which mimic real data, can facilitate privacy preservation. However, information on SD use in Canadian research is limited. OBJECTIVES: To review characteristics, methods, and applications of published studies generating SD from Canadian health data (HD), including administrative, survey, public health, and clinical sources. METHODS: We conducted a scoping review following Arksey and O'Malley, Joanna Briggs Institute, and Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews guidelines on studies (to September 2024) generating SD from provincial/national HD. We included English-language peer-reviewed articles and grey literature, identified through PubMed, Scopus, Web of Science, Google, and references. We extracted and descriptively analyzed data on HD types, research purposes, geographic sources, synthesis methods, and quality evaluation. SYNTHESIS: Of 232 identified articles, 31 were reviewed and nine met inclusion criteria; three additional articles were found through references and Google. Eleven articles were peer-reviewed. Topics included data replication, bias mitigation, and privacy-risk assessment. Survey data were most commonly synthesized. SD were generated from national/provincial datasets, including Canadian Community Health Survey and administrative/clinical data from Alberta, Manitoba, British Columbia, and Ontario. Synthesis methods included generative, sampling, and predictive models. Data quality evaluations assessed replicability, privacy risk, and predictive performance. CONCLUSION: SD have mainly been used in single-province studies and national surveys. Broader use in clinical and public HD with methodological consistency could strengthen its role for privacy-protecting, multi-jurisdictional research and surveillance initiatives.

Authors

Keywords

No keywords available for this article.