Prompt-Aware Scheduling for Efficient Text-to-Image Inferencing System
Journal:
arXiv
Published Date:
Jan 29, 2025
Abstract
Traditional ML models utilize controlled approximations during high loads,
employing faster, but less accurate models in a process called accuracy
scaling. However, this method is less effective for generative text-to-image
models due to their sensitivity to input prompts and performance degradation
caused by large model loading overheads. This work introduces a novel
text-to-image inference system that optimally matches prompts across multiple
instances of the same model operating at various approximation levels to
deliver high-quality images under high loads and fixed budgets.