Predicting DNA replication-related protein expression at the nucleus level from HE: A deep learning study with same-specimen paired data.
Journal:
Acta histochemica
Published Date:
Apr 25, 2026
Abstract
BACKGROUND: The inference of molecular information from hematoxylin-eosin (HE) specimens may reduce the ancillary testing burden in digital pathology. OBJECTIVE: To assess whether deep convolutional networks predict, at the per-nucleuslevel, continuous DAB optical density (OD) and binary positivity for a biologically coherent panel of DNA replication-related proteins (CDC6, CDT1, MCM7, ORC1, CDC7, and Geminin) plus Ki-67 directly from HE nuclear images. METHODS: We constructed a same-specimen paired HE/IHC dataset from 21 endometrioid carcinoma cases (7 per grade). After color unmixing via NMF-derived stain vectors and HoVer-Net-based nuclear segmentation, 100 × 100 nucleus-centered HE crops were paired one-to-one with IHC-derived per-nucleus OD. ImageNet-initialized backbones (ResNet-50 baseline vs. EfficientNet-B0 and MobileNetV3-Small) were trained with regression (OD) and classification (positivity) heads, and multi-task learning across markers was also evaluated. Case-wise splits ensured no patient overlap across training/validation/test. RESULTS: Across markers, the per-nucleus prediction of protein expression from HE nuclear morphology was feasible with moderate discriminative performance, with the strongest signal being observed for MCM7 (AUC-ROC ≈ 0.71; F1 ≈ 0.72). Performance was dependent on the markers used: MCM7 and Ki-67 consistently showed stronger discrimination (AUC-ROC ≈ 0.70-0.72), Geminin and CDC6 were moderately predictable in some settings, and CDT1 remained near the chance level (AUC-ROC ≈ 0.50). Among the architectures evaluated, ResNet-50 demonstrated the most stable generalization, and multi-task training yielded modest average gains, but was not consistently beneficial. Regarding the Ki-67 labeling index, the nuclear-crop approach showed moderate agreement with WSI-based digital IHC (ROI level r ≈ 0.54; MAE ≈ 15.2 pp). The major limitations of this study were the small cohort size (21 cases) and lack of external validation. CONCLUSIONS: Per-nucleus protein expression-both binary positivity and continuous OD-was shown to be inferable from HE nuclear morphology alone. These results suggest clinical utility and will motivate future studies using larger external cohorts and self-supervised pretraining.
Authors
Keywords
No keywords available for this article.