Robustness and sex differences in skin cancer detection: logistic regression vs CNNs
Journal:
arXiv
Published Date:
Apr 15, 2025
Abstract
Deep learning has been reported to achieve high performances in the detection
of skin cancer, yet many challenges regarding the reproducibility of results
and biases remain. This study is a replication (different data, same analysis)
of a study on Alzheimer's disease [28] which studied robustness of logistic
regression (LR) and convolutional neural networks (CNN) across patient sexes.
We explore sex bias in skin cancer detection, using the PAD-UFES-20 dataset
with LR trained on handcrafted features reflecting dermatological guidelines
(ABCDE and the 7-point checklist), and a pre-trained ResNet-50 model. We
evaluate these models in alignment with [28]: across multiple training datasets
with varied sex composition to determine their robustness. Our results show
that both the LR and the CNN were robust to the sex distributions, but the
results also revealed that the CNN had a significantly higher accuracy (ACC)
and area under the receiver operating characteristics (AUROC) for male patients
than for female patients. We hope these findings to contribute to the growing
field of investigating potential bias in popular medical machine learning
methods. The data and relevant scripts to reproduce our results can be found in
our Github.