How do clinician and parent reported data differ? An analysis of similarity and difference in the datasets from a cross-syndrome genetics cohort study(GenROC)

Journal: medRxiv
Published Date:

Abstract

Parent/patient-reported datasets provide ready access to phenotypic data for monogenic neurodevelopmental disorders yet their concordance with clinical data is unclear. In the GenROC study 547 children (mean age 7.6y, balanced sex ratio) had parallel parent-reported(PRD) web questionnaires and clinician-reported (CRD) Human Phenotype Ontology(HPO) proformas. We compared the two sources per participant by system, gene and gene group and overall for quantity, detail and similarity. 547 probands were analysed ranging in age from infancy to 16 years (mean 7.6) with similar gender distribution. PRD provided more terms for dental, gastroenterology, immunology and respiratory systems and for vision, (p <0.001 for all) and to a lesser degree for cardiac (p=0.0012). CRD provides more detail than PRD for most gene subgroups, combined systems and for neurology(p<0.001). Similarity scores were low overall per participant(mean 0.38 for combined) . Similarity scores were highest for cardiac (mean 0.74) and lowest for ENT(mean 0.34) There was minimal difference in similarity scores across gene groups or between the top 10 genes -scaffold adaptor gene groups had the highest (mean 0.43) as did STXBP1(mean 0.5) and CACNA1A(0.49). CRD is more similar to published syndrome phenotypes for syndromic genes. Parents reported more common childhood phenotypes, such as asthma and dental issues, whilst clinicians provided clinical phenotype descriptors, such as brain morphology and seizure semiology. It is important to understand the differences when designing studies and utilising datasets to appreciate their strengths and limitations. Parent-reported data are increasingly used in rare disease research due to their accessibility and breadth. Previous studies have shown that such data can be consistent with published literature, particularly in syndromic conditions. However, direct comparisons between parent-reported and clinician-reported data at the individual level have been limited, leaving a gap in understanding the reliability and granularity of these data sources. This study provides the first large-scale, individual-level comparison of parent-reported and clinician-reported phenotypic data across a cross-syndrome cohort. It demonstrates that while both sources contribute similar quantities of data, they differ in content and detail. Parents tend to report common childhood and lived experience phenotypes, whereas clinicians provide more specific clinical descriptors. The study also shows that clinician data are more consistent with published syndrome phenotypes, especially in syndromic genes. These findings highlight the complementary nature of parent and clinician data in rare disease research. Future studies and registries should consider integrating both sources to enhance phenotypic richness and accuracy. Policymakers and researchers designing data collection tools or machine learning applications should account for the strengths and limitations of each data type, ensuring that lived experience data are not overlooked in phenotype descriptions.

Authors

  • KJ Low; H Day; M Thanthilla; C Davis; HV Firth; CF Wright