Position Paper: Metadata Enrichment Model: Integrating Neural Networks and Semantic Knowledge Graphs for Cultural Heritage Applications
Journal:
arXiv
Published Date:
May 29, 2025
Abstract
The digitization of cultural heritage collections has opened new directions
for research, yet the lack of enriched metadata poses a substantial challenge
to accessibility, interoperability, and cross-institutional collaboration. In
several past years neural networks models such as YOLOv11 and Detectron2 have
revolutionized visual data analysis, but their application to domain-specific
cultural artifacts - such as manuscripts and incunabula - remains limited by
the absence of methodologies that address structural feature extraction and
semantic interoperability. In this position paper, we argue, that the
integration of neural networks with semantic technologies represents a paradigm
shift in cultural heritage digitization processes. We present the Metadata
Enrichment Model (MEM), a conceptual framework designed to enrich metadata for
digitized collections by combining fine-tuned computer vision models, large
language models (LLMs) and structured knowledge graphs. The Multilayer Vision
Mechanism (MVM) appears as the key innovation of MEM. This iterative process
improves visual analysis by dynamically detecting nested features, such as text
within seals or images within stamps. To expose MEM's potential, we apply it to
a dataset of digitized incunabula from the Jagiellonian Digital Library and
release a manually annotated dataset of 105 manuscript pages. We examine the
practical challenges of MEM's usage in real-world GLAM institutions, including
the need for domain-specific fine-tuning, the adjustment of enriched metadata
with Linked Data standards and computational costs. We present MEM as a
flexible and extensible methodology. This paper contributes to the discussion
on how artificial intelligence and semantic web technologies can advance
cultural heritage research, and also use these technologies in practice.