Decision Tree Induction Through LLMs via Semantically-Aware Evolution
Journal:
arXiv
Published Date:
Mar 18, 2025
Abstract
Decision trees are a crucial class of models offering robust predictive
performance and inherent interpretability across various domains, including
healthcare, finance, and logistics. However, current tree induction methods
often face limitations such as suboptimal solutions from greedy methods or
prohibitive computational costs and limited applicability of exact optimization
approaches. To address these challenges, we propose an evolutionary
optimization method for decision tree induction based on genetic programming
(GP). Our key innovation is the integration of semantic priors and
domain-specific knowledge about the search space into the optimization
algorithm. To this end, we introduce $\texttt{LLEGO}$, a framework that
incorporates semantic priors into genetic search operators through the use of
Large Language Models (LLMs), thereby enhancing search efficiency and targeting
regions of the search space that yield decision trees with superior
generalization performance. This is operationalized through novel genetic
operators that work with structured natural language prompts, effectively
utilizing LLMs as conditional generative models and sources of semantic
knowledge. Specifically, we introduce $\textit{fitness-guided}$ crossover to
exploit high-performing regions, and $\textit{diversity-guided}$ mutation for
efficient global exploration of the search space. These operators are
controlled by corresponding hyperparameters that enable a more nuanced balance
between exploration and exploitation across the search space. Empirically, we
demonstrate across various benchmarks that $\texttt{LLEGO}$ evolves
superior-performing trees compared to existing tree induction methods, and
exhibits significantly more efficient search performance compared to
conventional GP approaches.