LLM Meeting Decision Trees on Tabular Data
Journal:
arXiv
Published Date:
May 23, 2025
Abstract
Tabular data have been playing a vital role in diverse real-world fields,
including healthcare, finance, etc. With the recent success of Large Language
Models (LLMs), early explorations of extending LLMs to the domain of tabular
data have been developed. Most of these LLM-based methods typically first
serialize tabular data into natural language descriptions, and then tune LLMs
or directly infer on these serialized data. However, these methods suffer from
two key inherent issues: (i) data perspective: existing data serialization
methods lack universal applicability for structured tabular data, and may pose
privacy risks through direct textual exposure, and (ii) model perspective: LLM
fine-tuning methods struggle with tabular data, and in-context learning
scalability is bottle-necked by input length constraints (suitable for few-shot
learning). This work explores a novel direction of integrating LLMs into
tabular data throughough logical decision tree rules as intermediaries,
proposes a decision tree enhancer with LLM-derived rule for tabular prediction,
DeLTa. The proposed DeLTa avoids tabular data serialization, and can be applied
to full data learning setting without LLM fine-tuning. Specifically, we
leverage the reasoning ability of LLMs to redesign an improved rule given a set
of decision tree rules. Furthermore, we provide a calibration method for
original decision trees via new generated rule by LLM, which approximates the
error correction vector to steer the original decision tree predictions in the
direction of ``errors'' reducing. Finally, extensive experiments on diverse
tabular benchmarks show that our method achieves state-of-the-art performance.