Mutation-Guided LLM-based Test Generation at Meta
Journal:
arXiv
Published Date:
Jan 22, 2025
Abstract
This paper describes Meta's ACH system for mutation-guided LLM-based test
generation. ACH generates relatively few mutants (aka simulated faults),
compared to traditional mutation testing. Instead, it focuses on generating
currently undetected faults that are specific to an issue of concern. From
these currently uncaught faults, ACH generates tests that can catch them,
thereby `killing' the mutants and consequently hardening the platform against
regressions. We use privacy concerns to illustrate our approach, but ACH can
harden code against {\em any} type of regression. In total, ACH was applied to
10,795 Android Kotlin classes in 7 software platforms deployed by Meta, from
which it generated 9,095 mutants and 571 privacy-hardening test cases. ACH also
deploys an LLM-based equivalent mutant detection agent that achieves a
precision of 0.79 and a recall of 0.47 (rising to 0.95 and 0.96 with simple
pre-processing). ACH was used by Messenger and WhatsApp test-a-thons where
engineers accepted 73% of its tests, judging 36% to privacy relevant. We
conclude that ACH hardens code against specific concerns and that, even when
its tests do not directly tackle the specific concern, engineers find them
useful for their other benefits.