Who Owns the Output? Bridging Law and Technology in LLMs Attribution
Journal:
arXiv
Published Date:
Mar 29, 2025
Abstract
Since the introduction of ChatGPT in 2022, Large language models (LLMs) and
Large Multimodal Models (LMM) have transformed content creation, enabling the
generation of human-quality content, spanning every medium, text, images,
videos, and audio. The chances offered by generative AI models are endless and
are drastically reducing the time required to generate content and usually
raising the quality of the generation. However, considering the complexity and
the difficult traceability of the generated content, the use of these tools
provides challenges in attributing AI-generated content. The difficult
attribution resides for a variety of reasons, starting from the lack of a
systematic fingerprinting of the generated content and ending with the enormous
amount of data on which LLMs and LMM are trained, which makes it difficult to
connect generated content to the training data. This scenario is raising
concerns about intellectual property and ethical responsibilities. To address
these concerns, in this paper, we bridge the technological, ethical, and
legislative aspects, by proposing a review of the legislative and technological
instruments today available and proposing a legal framework to ensure
accountability. In the end, we propose three use cases of how these can be
combined to guarantee that attribution is respected. However, even though the
techniques available today can guarantee a greater attribution to a greater
extent, strong limitations still apply, that can be solved uniquely by the
development of new attribution techniques, to be applied to LLMs and LMMs.