Machine learning enables legal risk assessment in internet healthcare using HIPAA data.
Journal:
Scientific reports
Published Date:
Aug 5, 2025
Abstract
This study explores how artificial intelligence technologies can enhance the regulatory capacity for legal risks in internet healthcare based on a machine learning (ML) analytical framework and utilizes data from the health insurance portability and accountability act (HIPAA) database. The research methods include data collection and processing, construction and optimization of ML models, and the application of a risk assessment framework. Firstly, the data are sourced from the HIPAA database, encompassing various data types, such as medical records, patient personal information, and treatment costs. Secondly, to address missing values and noise in the data, preprocessing methods such as denoising, normalization, and feature extraction are employed to ensure data quality and model accuracy. Finally, in the selection of ML models, this study experiments with several common algorithms, including extreme gradient boosting (XGBoost), support vector machine (SVM), random forest (RF), and deep neural network (DNN). Each algorithm has its strengths and limitations depending on the specific legal risk assessment task. RF enhances classification performance by integrating multiple decision trees, while SVM achieves efficient classification by identifying the maximum margin hyperplane. DNN demonstrates strong capabilities in handling complex nonlinear relationships, and XGBoost further improves classification accuracy by optimizing decision tree models through gradient boosting. Model performance is evaluated using metrics such as accuracy, recall, precision, F1 score, and area under curve (AUC) value. The experimental results indicate that the DNN model performs excellently in terms of F1 score, accuracy, and recall, showcasing its efficiency and stability in legal risk assessment. The principal component analysis-random forest (PCA+RF) and RF models also exhibit stable performance, making them suitable for various application scenarios. In contrast, the SVM and K-Nearest Neighbor models perform relatively weaker, although they still retain some validity in certain contexts, their overall performance is inferior to deep learning and ensemble learning methods. This study not only provides effective ML tools for legal risk assessment in internet healthcare but also offers theoretical support and practical guidance for future research in this field.