Trustworthiness in Stochastic Systems: Towards Opening the Black Box
Journal:
arXiv
Published Date:
Jan 27, 2025
Abstract
AI systems are increasingly tasked to complete responsibilities with
decreasing oversight. This delegation requires users to accept certain risks,
typically mitigated by perceived or actual alignment of values between humans
and AI, leading to confidence that the system will act as intended. However,
stochastic behavior by an AI system threatens to undermine alignment and
potential trust. In this work, we take a philosophical perspective to the
tension and potential conflict between stochasticity and trustworthiness. We
demonstrate how stochasticity complicates traditional methods of establishing
trust and evaluate two extant approaches to managing it: (1) eliminating
user-facing stochasticity to create deterministic experiences, and (2) allowing
users to independently control tolerances for stochasticity. We argue that both
approaches are insufficient, as not all forms of stochasticity affect
trustworthiness in the same way or to the same degree. Instead, we introduce a
novel definition of stochasticity and propose latent value modeling for both AI
systems and users to better assess alignment. This work lays a foundational
step toward understanding how and when stochasticity impacts trustworthiness,
enabling more precise trust calibration in complex AI systems, and underscoring
the importance of sociotechnical analyses to effectively address these
challenges.