Applications of Large Language Model Reasoning in Feature Generation
Journal:
arXiv
Published Date:
Mar 15, 2025
Abstract
Large Language Models (LLMs) have revolutionized natural language processing
through their state of art reasoning capabilities. This paper explores the
convergence of LLM reasoning techniques and feature generation for machine
learning tasks. We examine four key reasoning approaches: Chain of Thought,
Tree of Thoughts, Retrieval-Augmented Generation, and Thought Space
Exploration. Our analysis reveals how these approaches can be used to identify
effective feature generation rules without having to manually specify search
spaces. The paper categorizes LLM-based feature generation methods across
various domains including finance, healthcare, and text analytics. LLMs can
extract key information from clinical notes and radiology reports in
healthcare, by enabling more efficient data utilization. In finance, LLMs
facilitate text generation, summarization, and entity extraction from complex
documents. We analyze evaluation methodologies for assessing feature quality
and downstream performance, with particular attention to OCTree's decision tree
reasoning approach that provides language-based feedback for iterative
improvements. Current challenges include hallucination, computational
efficiency, and domain adaptation. As of March 2025, emerging approaches
include inference-time compute scaling, reinforcement learning, and supervised
fine-tuning with model distillation. Future directions point toward multimodal
feature generation, self-improving systems, and neuro-symbolic approaches. This
paper provides a detailed overview of an emerging field that promises to
automate and enhance feature engineering through language model reasoning.