Augmenting sparse behavior data for user identity linkage with self-generated by model and mixup-generated samples.

Journal: Neural networks : the official journal of the International Neural Network Society

PMID: 40081271

Abstract

The user identity linkage task aims to associate user accounts belonging to the same individual by utilizing user data. This task is relevant in domains such as recommendation systems, where user-generated content (i.e., behavioral data) serves as the key information for identifying users. However, user identity linkage tasks relying on behavioral data face two primary challenges due to data sparsity: insufficient user behavior data and the presence of low-frequency behavior items. These issues hinder accurate modeling and exacerbate representation errors. To address these challenges, we propose two data augmentation methods: self-generated samples by the model and mixup-generated samples. Collectively, these methods are referred to as SGAMDA (Self-generated by Model and Mixup-generated Samples-based Data Augmentation). The self-generated samples method uses Variational Autoencoders to generate new training data by decoding samples in the representation space. The mixup-generated samples method creates new training data by mixing the behavior data of different user groups, thereby alleviating data sparsity. SGAMDA categorizes user behavior data based on data volume and the proportion of low-frequency behaviors to guide the two data augmentation strategies. We evaluate SGAMDA on the Movies2Books and CDs2Movies datasets for user identity linkage tasks. The results show that SGAMDA significantly improves prediction accuracy, enhancing behavior representation through the proposed data augmentation methods.

Authors

Hongren Huang

Beijing Advanced Innovation Center for Big Data and Brain Computing, China; School of Computer Science and Engineering, Beihang University, Beijing, China. Electronic address: by1806183@buaa.edu.cn.
Jianxin Li

Department of Ultrasonography, Weihai Municipal Hospital, Shandong, China.
Feihong Lu

Beijing Advanced Innovation Center for Big Data and Brain Computing, China; School of Computer Science and Engineering, Beihang University, Beijing, China. Electronic address: lufeihong@buaa.edu.cn.
Lihong Wang
Qian Li

Emergency and Critical Care Center, Department of Emergency Medicine, Zhejiang Provincial People's Hospital, People's Hospital of Hangzhou Medical College, Hangzhou, Zhejiang, China.
Qingyun Sun

Keywords

Algorithms Behavior Humans Neural Networks, Computer

External Resources

View on PubMed Access via DOI PubMed (40081271)

Augmenting sparse behavior data for user identity linkage with self-generated by model and mixup-generated samples.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals