Semantic Caching for Improving Web Affordability
Journal:
arXiv
Published Date:
Jun 25, 2025
Abstract
The rapid growth of web content has led to increasingly large webpages,
posing significant challenges for Internet affordability, especially in
developing countries where data costs remain prohibitively high. We propose
semantic caching using Large Language Models (LLMs) to improve web
affordability by enabling reuse of semantically similar images within webpages.
Analyzing 50 leading news and media websites, encompassing 4,264 images and
over 40,000 image pairs, we demonstrate potential for significant data transfer
reduction, with some website categories showing up to 37% of images as
replaceable. Our proof-of-concept architecture shows users can achieve
approximately 10% greater byte savings compared to exact caching. We evaluate
both commercial and open-source multi-modal LLMs for assessing semantic
replaceability. GPT-4o performs best with a low Normalized Root Mean Square
Error of 0.1735 and a weighted F1 score of 0.8374, while the open-source LLaMA
3.1 model shows comparable performance, highlighting its viability for
large-scale applications. This approach offers benefits for both users and
website operators, substantially reducing data transmission. We discuss ethical
concerns and practical challenges, including semantic preservation, user-driven
cache configuration, privacy concerns, and potential resistance from website
operators