Automatic contour quality assurance using deep-learning based contours.
Journal:
Physics in medicine and biology
Published Date:
Jul 16, 2025
Abstract
Safe deployment of auto-contouring models requires the inclusion of automated quality assurance (QA). One such approach is to use two independent auto-contouring models and compare them geometrically for acceptability. This is not effective because geometric differences may not correlate with clinically significant errors. Herein, we investigated whether a two-contour QA system is improved by including dose in this comparison.Volumetric modulated arc therapy plans were generated for 86 head and neck (H&N) and 50 cervical (GYN) cancer patients, using clinically-approved planning target volumes (PTVs) and auto-contour organs-at-risk (OARs) from a primary auto-contouring model. Doses to the primary OARs were compared with doses to manually drawn and approved OARs ('the truth'). A difference inor⩾ 2 Gy was identified as a reporting error (). A second, independent auto-contouring model was then used to contour the OARs (verification). The primary and verification auto-contouring models were compared geometrically (dice similarity coefficient (DSC), surface DSC, 95% Hausdorff distance, mean surface distance) and dosimetrically (,). The ability of comparison metrics between the two auto-contouring models to flag actual dosimetric errors (i.e. primary model compared with the truth) was investigated. A logistic regression model was used to predict. The data was divided by disease site and into 50/50 stratified training and testing sets;-fold cross validation was employed during training to avoid overfitting. H&N structures were further divided into size-specific groups to improve model performance and generalizability.Including dose metrics in the logistic regression model to predictincreased the performance in terms of receiver-operating characteristic curve-area under the curve and area under the precision-recall curve in the test set for H&N small structures. For, including dose metrics increased performance for H&N small structures, H&N medium structures, and GYN structures.In many instances, utilizing dose with geometric comparisons can improve the ability of a verification model to flag potential errors from a primary auto-contouring model.