Learning about individuals' health from aggregate data.

Journal: Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference

Published Date: Jul 1, 2017

Abstract

There is growing awareness that user-generated social media content contains valuable health-related information and is more convenient to collect than typical health data. For example, Twitter has been employed to predict aggregate-level outcomes, such as regional rates of diabetes and child poverty, and to identify individual cases of depression and food poisoning. Models which make aggregate-level inferences can be induced from aggregate data, and consequently are straightforward to build. In contrast, learning models that produce individual-level (IL) predictions, which are more informative, usually requires a large number of difficult-to-acquire labeled IL examples. This paper presents a new machine learning method which achieves the best of both worlds, enabling IL models to be learned from aggregate labels. The algorithm makes predictions by combining unsupervised feature extraction, aggregate-based modeling, and optimal integration of aggregate-level and IL information. Two case studies illustrate how to learn health-relevant IL prediction models using only aggregate labels, and show that these models perform as well as state-of-the-art models trained on hundreds or thousands of labeled individuals.

Authors

Rich Colbaugh
Kristin Glass

Keywords

Algorithms Health Records, Personal Humans Machine Learning Social Media

External Resources

View on PubMed Access via DOI PubMed (29060555)

Learning about individuals' health from aggregate data.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals