A machine learning approach to predict ethnicity using personal name and census location in Canada.

Journal: PloS one
Published Date:

Abstract

BACKGROUND: Canada is an ethnically-diverse country, yet its lack of ethnicity information in many large databases impedes effective population research and interventions. Automated ethnicity classification using machine learning has shown potential to address this data gap but its performance in Canada is largely unknown. This study conducted a large-scale machine learning framework to predict ethnicity using a novel set of name and census location features.

Authors

  • Kai On Wong
    School of Public Health, University of Alberta, Edmonton, Alberta, Canada.
  • Osmar R Zaïane
    Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada.
  • Faith G Davis
    School of Public Health, University of Alberta, Edmonton, Alberta, Canada.
  • Yutaka Yasui
    Department of Gastroenterology and Hepatology Musashino Red Cross Hospital Tokyo Japan.