Augmenting sparse behavior data for user identity linkage with self-generated by model and mixup-generated samples.

Journal: Neural networks : the official journal of the International Neural Network Society
PMID:

Abstract

The user identity linkage task aims to associate user accounts belonging to the same individual by utilizing user data. This task is relevant in domains such as recommendation systems, where user-generated content (i.e., behavioral data) serves as the key information for identifying users. However, user identity linkage tasks relying on behavioral data face two primary challenges due to data sparsity: insufficient user behavior data and the presence of low-frequency behavior items. These issues hinder accurate modeling and exacerbate representation errors. To address these challenges, we propose two data augmentation methods: self-generated samples by the model and mixup-generated samples. Collectively, these methods are referred to as SGAMDA (Self-generated by Model and Mixup-generated Samples-based Data Augmentation). The self-generated samples method uses Variational Autoencoders to generate new training data by decoding samples in the representation space. The mixup-generated samples method creates new training data by mixing the behavior data of different user groups, thereby alleviating data sparsity. SGAMDA categorizes user behavior data based on data volume and the proportion of low-frequency behaviors to guide the two data augmentation strategies. We evaluate SGAMDA on the Movies2Books and CDs2Movies datasets for user identity linkage tasks. The results show that SGAMDA significantly improves prediction accuracy, enhancing behavior representation through the proposed data augmentation methods.

Authors

  • Hongren Huang
    Beijing Advanced Innovation Center for Big Data and Brain Computing, China; School of Computer Science and Engineering, Beihang University, Beijing, China. Electronic address: by1806183@buaa.edu.cn.
  • Jianxin Li
    Department of Ultrasonography, Weihai Municipal Hospital, Shandong, China.
  • Feihong Lu
    Beijing Advanced Innovation Center for Big Data and Brain Computing, China; School of Computer Science and Engineering, Beihang University, Beijing, China. Electronic address: lufeihong@buaa.edu.cn.
  • Lihong Wang
  • Qian Li
    Emergency and Critical Care Center, Department of Emergency Medicine, Zhejiang Provincial People's Hospital, People's Hospital of Hangzhou Medical College, Hangzhou, Zhejiang, China.
  • Qingyun Sun