Deep Contrastive Learning for Feature Alignment: Insights from Housing-Household Relationship Inference
Journal:
arXiv
Published Date:
Feb 16, 2025
Abstract
Housing and household characteristics are key determinants of social and
economic well-being, yet our understanding of their interrelationships remains
limited. This study addresses this knowledge gap by developing a deep
contrastive learning (DCL) model to infer housing-household relationships using
the American Community Survey (ACS) Public Use Microdata Sample (PUMS). More
broadly, the proposed model is suitable for a class of problems where the goal
is to learn joint relationships between two distinct entities without
explicitly labeled ground truth data. Our proposed dual-encoder DCL approach
leverages co-occurrence patterns in PUMS and introduces a bisect K-means
clustering method to overcome the absence of ground truth labels. The
dual-encoder DCL architecture is designed to handle the semantic differences
between housing (building) and household (people) features while mitigating
noise introduced by clustering. To validate the model, we generate a synthetic
ground truth dataset and conduct comprehensive evaluations. The model further
demonstrates its superior performance in capturing housing-household
relationships in Delaware compared to state-of-the-art methods. A
transferability test in North Carolina confirms its generalizability across
diverse sociodemographic and geographic contexts. Finally, the post-hoc
explainable AI analysis using SHAP values reveals that tenure status and
mortgage information play a more significant role in housing-household matching
than traditionally emphasized factors such as the number of persons and rooms.