Five-Gene Expression Formula Accurately Detects Hepatocellular Carcinoma Tumors
Journal:
arXiv
Published Date:
Jun 27, 2025
Abstract
Hepatocellular carcinoma (HCC) is one of the leading causes of cancer-related
deaths worldwide. Several diagnostic methods, such as imaging modalities and
Serum Alpha-Fetoprotein (AFP) testing, have been used for HCC detection;
however, their effectiveness is limited to later stages of the disease. In
contrast, transcriptomic analysis of biposy samples has shown promise for early
detection. While machine learning techniques have been applied to
transcriptomic data for cancer detection, their clinical adoption remains
limited due to challenges such as poor generalizability across different
datasets, lack of interpretability, and high computational complexity. To
address these limitations, we developed a novel predictive formula for HCC
detection using the Kolmogorov-Arnold Network (KAN). This formula is based on
the expression levels of five genes: VIPR1, CYP1A2, FCN3, ECM1, and LIFR.
Derived from the GSE25097 dataset, the formula offers a simple, interpretable,
efficient, and accessible approach for HCC identification. It achieves 99%
accuracy on the GSE25097 test set and demonstrates robust performance on six
additional independent datasets, achieving accuracies of above 90% in all
cases. These findings highlight the critical role of these five genes as
biomarkers for HCC detection, offering a foundation for future research and
clinical applications to improve HCC diagnostic approaches.