Massive data clustering by multi-scale psychological observations.

Journal: National science review

Published Date: Oct 8, 2021

Abstract

Clustering is the discovery of latent group structure in data and is a fundamental problem in artificial intelligence, and a vital procedure in data-driven scientific research over all disciplines. Yet, existing methods have various limitations, especially weak cognitive interpretability and poor computational scalability, when it comes to clustering massive datasets that are increasingly available in all domains. Here, by simulating the multi-scale cognitive observation process of humans, we design a scalable algorithm to detect clusters hierarchically hidden in massive datasets. The observation scale changes, following the Weber-Fechner law to capture the gradually emerging meaningful grouping structure. We validated our approach in real datasets with up to a billion records and 2000 dimensions, including taxi trajectories, single-cell gene expressions, face images, computer logs and audios. Our approach outperformed popular methods in usability, efficiency, effectiveness and robustness across different domains.

Authors

Shusen Yang

National Engineering Laboratory of Big Data Analytics, Xi'an Jiaotong University, Xi'an 710049, China.
Liwen Zhang

National Engineering Laboratory of Big Data Analytics, Xi'an Jiaotong University, Xi'an 710049, China.
Chen Xu

Department of Mathematics and Statistics, University of Ottawa, Ottawa, ON K1N 6N5, Canada.
Hanqiao Yu

National Engineering Laboratory of Big Data Analytics, Xi'an Jiaotong University, Xi'an 710049, China.
Jianqing Fan

Center for Statistics and Machine Learning, Princeton University, Princeton, NJ 08544, USA.
Zongben Xu

National Engineering Laboratory of Big Data Analytics, Xi'an Jiaotong University, Xi'an 710049, China.

Keywords

No keywords available for this article.

External Resources

View on PubMed Access via DOI PubMed (35242339)

Massive data clustering by multi-scale psychological observations.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals