[Progress in method development and application of distributed learning for estimation of epidemiological effect].

Journal: Zhonghua liu xing bing xue za zhi = Zhonghua liuxingbingxue zazhi
Published Date:

Abstract

To systematically review the progress in the method development and application of distributed learning in the estimation of epidemiological effect and provide methodological reference for multi-center studies. We conducted a literature retrieval for English papers published up to December 31, 2023 by using keywords of "health/medical big data" and "distributed/federated learning". After consulting experts, we set criteria of paper inclusion and exclusion and created a framework for data extraction. We collected information about basic study details, including method, application, and evaluation. Two researchers independently screened the papers and extracted information. We used EndNote 20 for the management of literatures and EpiData for the management of data. A total of 3 444 papers were collected, and 29 papers were included in the final analysis. Most of the papers (25, 86.2%) were published in or after 2019, and the papers were mainly from the United States (21/29, 72.4%). For the estimation of epidemiological effects, 22 distributed learning methods had been developed, including methods for logistic regression (8), Cox regression (8), Poisson regression (2), and generalized linear mixed model (GLMM) (4), as well as three platforms for distributed analysis (VLP, Vantage6, AusCAT). The 29 papers described 45 applications, with 20 (44.4%) focusing on the establishment of prediction model and 25 (55.6%) on association analysis. Importantly, except for GLMM, current distributed learning methods can estimate effects with little bias in 1-3 rounds of communication. These methods show less bias compared with meta-analysis, especially in the address of data heterogeneity and rare outcomes. However, less studies examined how differences in data structure and sparse data affect results, an area that requires further research. While distributed learning shows promise in epidemiological effect estimation, it is still in early development, requiring further research on data heterogeneity handling and communication efficiency improvement.

Authors

  • J T Yang
    Centers of System Biology, Data Information and Reproductive Health, School of Basic Medical Science, School of Basic Medical Science, Central South University, Changsha, 410008, Hunan, China.
  • X Gao
  • X X Wang
  • M D Zhang
    Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing 100191, China Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, Beijing 100191, China.
  • X Chen
    Division of Infectious Diseases,The People's Hospital of Meizhou,Meizhou,China.
  • Y L Wang
    Department of Forensic Medicine, Inner Mongolia Medical University, Hohhot 010030, China.
  • Z K Liu
    Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing 100191, China Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, Beijing 100191, China.
  • S Y Zhan