CReFT-CAD: Boosting Orthographic Projection Reasoning for CAD via Reinforcement Fine-Tuning
Journal:
arXiv
Published Date:
May 31, 2025
Abstract
Computer-Aided Design (CAD) plays a pivotal role in industrial manufacturing.
Orthographic projection reasoning underpins the entire CAD workflow,
encompassing design, manufacturing, and simulation. However, prevailing
deep-learning approaches employ standard 3D reconstruction pipelines as an
alternative, which often introduce imprecise dimensions and limit the
parametric editability required for CAD workflows. Recently, some researchers
adopt vision-language models (VLMs), particularly supervised fine-tuning (SFT),
to tackle CAD-related challenges. SFT shows promise but often devolves into
pattern memorization, yielding poor out-of-distribution performance on complex
reasoning tasks. To address these gaps, we introduce CReFT-CAD, a two-stage
fine-tuning paradigm that first employs a curriculum-driven reinforcement
learning stage with difficulty-aware rewards to build reasoning ability
steadily, and then applies supervised post-tuning to hone instruction following
and semantic extraction. Complementing this, we release TriView2CAD, the first
large-scale, open-source benchmark for orthographic projection reasoning,
comprising 200,000 synthetic and 3,000 real-world orthographic projections with
precise dimension annotations and six interoperable data modalities. We
benchmark leading VLMs on orthographic projection reasoning and demonstrate
that CReFT-CAD substantially improves reasoning accuracy and
out-of-distribution generalizability in real-world scenarios, offering valuable
insights for advancing CAD reasoning research.