Repurposing the scientific literature with vision-language models

Journal: arXiv
Published Date:

Abstract

Leading vision-language models (VLMs) are trained on general Internet content, overlooking scientific journals' rich, domain-specific knowledge. Training on specialty-specific literature could yield high-performance, task-specific tools, enabling generative AI to match generalist models in specialty publishing, educational, and clinical tasks. We created NeuroPubs, a multimodal dataset of 23,000 Neurosurgery Publications articles (134M words, 78K image-caption pairs). Using NeuroPubs, VLMs generated publication-ready graphical abstracts (70% of 100 abstracts) and board-style questions indistinguishable from human-written ones (54% of 89,587 questions). We used these questions to train CNS-Obsidian, a 34B-parameter VLM. In a blinded, randomized controlled trial, our model demonstrated non-inferiority to then state-of-the-art GPT-4o in neurosurgical differential diagnosis (clinical utility, 40.62% upvotes vs. 57.89%, p=0.1150; accuracy, 59.38% vs. 65.79%, p=0.3797). Our pilot study demonstrates how training generative AI models on specialty-specific journal content - without large-scale internet data - results in high-performance academic and clinical tools, enabling domain-tailored AI across diverse fields.

Authors

  • Anton Alyakin
  • Jaden Stryker
  • Daniel Alexander Alber
  • Karl L. Sangwon
  • Jin Vivian Lee
  • Brandon Duderstadt
  • Akshay Save
  • David Kurland
  • Spencer Frome
  • Shrutika Singh
  • Jeff Zhang
  • Eunice Yang
  • Ki Yun Park
  • Cordelia Orillac
  • Aly A. Valliani
  • Sean Neifert
  • Albert Liu
  • Aneek Patel
  • Christopher Livia
  • Darryl Lau
  • Ilya Laufer
  • Peter A. Rozman
  • Eveline Teresa Hidalgo
  • Howard Riina
  • Rui Feng
  • Todd Hollon
  • Yindalon Aphinyanaphongs
  • John G. Golfinos
  • Laura Snyder
  • Eric Leuthardt
  • Douglas Kondziolka
  • Eric Karl Oermann