Encoding of pretrained large language models mirrors the genetic architectures of human psychological traits

Journal: medRxiv

Published Date: Jan 1, 2025

Abstract

Recent advances in large language models (LLMs) have prompted a frenzy in utilizing them as universal translators for biomedical terms. However, the black box nature of LLMs has forced researchers to rely on artificially designed benchmarks without understanding what exactly LLMs encode. We demonstrate that pretrained LLMs can already explain up to 51% of the genetic correlation between items from a psychometrically-validated neuroticism questionnaire, without any fine-tuning. For psychiatric diagnoses, we found disorder names aligned better with genetic relationships than diagnostic descriptions. Our results indicate the pretrained LLMs have encodings mirroring genetic architectures. These findings highlight LLMs’ potential for validating phenotypes, refining taxonomies, and integrating textual and genetic data in mental health research.

Authors

Bohan Xu; Nick Obradovich; Wenjie Zheng; Robert Loughnan; Lucy Shao; Masaya Misaki; Wesley K. Thompson; Martin Paulus; Chun Chieh Fan

External Resources

View on medRxiv Access via DOI

Encoding of pretrained large language models mirrors the genetic architectures of human psychological traits

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

Encoding of pretrained large language models mirrors the genetic architectures of human psychological traits

Abstract

Authors

Categories

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals