Comparative Analysis of Data Generation Techniques for Breast Cancer Research Using Artificial Intelligence.

Journal: AMIA ... Annual Symposium proceedings. AMIA Symposium
Published Date:

Abstract

This study investigates the use of ChatGPT to support clinical teams with limited expertise in generating synthetic data for breast cancer research. It assesses ChatGPT's application, focusing on effective prompting and best practices for creating high-fidelity synthetic data. The research compares the generated synthetic data to the Wisconsin Breast Cancer Dataset through statistical analysis, structural similarity metrics, and machine learning performance. Results indicate that the quality of prompts and generation techniques significantly affects the data's fidelity. The study highlights the critical role of prompt engineering and data synthesis techniques in producing accurate synthetic data for healthcare research, underscoring the need for precise prompts and generation methods to maintain data integrity in sensitive areas like cancer research.

Authors

  • Tia M Pope
    North Carolina A&T State University, Greensboro, NC, United States.
  • Ahmad Patooghy
    North Carolina A&T State University, Greensboro, NC, United States.