Comparison of glass-based and whole-slide image-based grade assignment for clear-cell and papillary renal cell carcinoma: a multi-cohort, international reproducibility study.
Journal:
Histopathology
Published Date:
Oct 31, 2025
Abstract
BACKGROUND AND AIMS: Histological grading of renal cell carcinoma (RCC) is an important part of diagnostic evaluation. Reproducibility of RCC grading using whole-slide imaging (WSI) compared to glass-slide microscopy is understudied. The aim of the study was a head-to-head evaluation of WSI-based and glass-based grading approaches in clear-cell carcinoma (ccRCC) and papillary renal cell carcinoma (pRCC) subtypes. METHODS: Four cohorts of patient cases with glass slides and corresponding digitized WSI were included from two institutions (cases n, Institution 1 (I-1): ccRCC 100, pRCC 89; Institution 2 (I-2): ccRCC 97, pRCC 50). Nine board-certified pathologists provided grades, with some pathologists evaluating both glass-based and WSI-based slides in the same cohorts. An interobserver and intraobserver (different modalities) analysis was carried out, including comparisons to majority vote and consensus grades using kappa statistics. Information on prognostic endpoint (overall survival) was available for cases from Institution 1. RESULTS: In ccRCC cases, interobserver pairwise comparison among pathologists showed low to moderate agreement, similar for glass-based (kappa range 0.14-0.77) and WSI-based (0.12-0.83) approaches, with in general similar results for pRCC subtype. Significant differences could be observed for datasets stemming from two institutions: ccRCC kappa average 0.73 and 0.54 for I-1 and I-2, respectively, for glass-based, and 0.66 and 0.48 for the WSI-based approach, revealing staining differences as a potential important confounder. Intraobserver (same pathologist, same cases, glass-based vs. WSI-based) analyses revealed significant differences in assigned grades with trends to both under-grading and over-grading. For ccRCC (I-1: pathologists n = 5, I-2: n = 3), the kappa range was 0.47-0.90 for I-1 and 0.43-0.70 for I-2. In the majority vote/consensus grade analysis, there was a clear general trend to over-grading using the WSI-based approach, with more cases scored as G4. Prognostic analysis showed the value of both WSI-based and glass-based approaches. CONCLUSIONS: WSI-based grading approach for RCC results in divergent grading outcomes, with a trend to over-grading. The interobserver and especially intraobserver agreement present in low to moderate areas for both modalities warrants more standardization and exploring the potential of artificial intelligence for grading objectivization. Institute-specific staining differences might be a confounder for less reproducible RCC grading. We open-source all digital datasets and grades for education and research purposes.
Authors
Keywords
No keywords available for this article.