Flexible Blood Glucose Control: Offline Reinforcement Learning from Human Feedback
Journal:
arXiv
Published Date:
Jan 27, 2025
Abstract
Reinforcement learning (RL) has demonstrated success in automating insulin
dosing in simulated type 1 diabetes (T1D) patients but is currently unable to
incorporate patient expertise and preference. This work introduces PAINT
(Preference Adaptation for INsulin control in T1D), an original RL framework
for learning flexible insulin dosing policies from patient records. PAINT
employs a sketch-based approach for reward learning, where past data is
annotated with a continuous reward signal to reflect patient's desired
outcomes. Labelled data trains a reward model, informing the actions of a novel
safety-constrained offline RL algorithm, designed to restrict actions to a safe
strategy and enable preference tuning via a sliding scale. In-silico evaluation
shows PAINT achieves common glucose goals through simple labelling of desired
states, reducing glycaemic risk by 15% over a commercial benchmark. Action
labelling can also be used to incorporate patient expertise, demonstrating an
ability to pre-empt meals (+10% time-in-range post-meal) and address certain
device errors (-1.6% variance post-error) with patient guidance. These results
hold under realistic conditions, including limited samples, labelling errors,
and intra-patient variability. This work illustrates PAINT's potential in
real-world T1D management and more broadly any tasks requiring rapid and
precise preference learning under safety constraints.