Fair-PP: A Synthetic Dataset for Aligning LLM with Personalized Preferences of Social Equity

Journal: arXiv

Published Date: May 17, 2025

Abstract

Human preference plays a crucial role in the refinement of large language models (LLMs). However, collecting human preference feedback is costly and most existing datasets neglect the correlation between personalization and preferences. To address this issue, we introduce Fair-PP, a synthetic dataset of personalized preferences targeting social equity, derived from real-world social survey data, which includes 28 social groups, 98 equity topics, and 5 personal preference dimensions. Leveraging GPT-4o-mini, we engage in role-playing based on seven representative persona portrayals guided by existing social survey data, yielding a total of 238,623 preference records. Through Fair-PP, we also contribute (i) An automated framework for generating preference data, along with a more fine-grained dataset of personalized preferences; (ii) analysis of the positioning of the existing mainstream LLMs across five major global regions within the personalized preference space; and (iii) a sample reweighting method for personalized preference alignment, enabling alignment with a target persona while maximizing the divergence from other personas. Empirical experiments show our method outperforms the baselines.

Authors

Qi Zhou
Jie Zhang
Dongxia Wang
Qiang Liu
Tianlin Li
Jin Song Dong
Wenhai Wang
Qing Guo

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2505.11861v1)

Fair-PP: A Synthetic Dataset for Aligning LLM with Personalized Preferences of Social Equity

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

Fair-PP: A Synthetic Dataset for Aligning LLM with Personalized Preferences of Social Equity

Abstract

Authors

Categories

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals