Machine Learning and Large Language Models for Modeling Complex Toxicity Pathways and Predicting Steroidogenesis.

Journal: Environmental science & technology

Published Date: Jul 15, 2025

Abstract

High-throughput screening and computational models have been effective in predicting chemical interactions with estrogen and androgen receptors, but similar approaches for steroidogenesis remain limited. To address this gap, we developed general steroidogenesis modulation models using data from ∼1,800 chemicals screened in H295R human adrenocortical carcinoma cells. A random forest model was validated using a prospective test set of 20 compounds (14 predicted active, 6 inactive), achieving 80% accuracy with conformal prediction adjustments. In parallel, we built classification and regression models based on IC data from ChEMBL for key steroidogenic enzymes, including CYP17A1, CYP21A2, CYP11B1, CYP11B2, 17β-HSD (1/2/3/5), 5α-reductase (1/2), and CYP19A1 (126-9,327 compounds per target). These models enable predictions of both general steroidogenesis inhibition and potential molecular targets. Additionally, we developed a transformer-based model (MolBART) to predict all end points simultaneously and validated this performance. Combined, these models may offer a rapid and scalable system for assessing chemical impacts on steroidogenesis, supporting chemical risk assessment, product stewardship, and regulatory decision-making.

Authors

Thomas R Lane

Collaborations Pharmaceuticals, Inc. , Main Campus Drive, Lab 3510 , Raleigh , North Carolina 27606 , United States.
Patricia A Vignaux

Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States.
Joshua S Harris

Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States.
Scott H Snyder

Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States.
Fabio Urbina

Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA.
Sean Ekins

Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA; Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA; Collaborations Pharmaceuticals, Inc., 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA; Phoenix Nest, Inc., P.O. Box 150057, Brooklyn, NY 11215, USA; Hereditary Neuropathy Foundation, 401 Park Avenue South, 10th Floor, New York, NY 10016, USA. Electronic address: ekinssean@yahoo.com.

Keywords

Cell Line, Tumor Humans Large Language Models Machine Learning Steroids

External Resources

View on PubMed Access via DOI PubMed (40576990)

Machine Learning and Large Language Models for Modeling Complex Toxicity Pathways and Predicting Steroidogenesis.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Machine Learning and Large Language Models for Modeling Complex Toxicity Pathways and Predicting Steroidogenesis.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals