Who Are You Behind the Screen? Implicit MBTI and Gender Detection Using Artificial Intelligence
Journal:
arXiv
Published Date:
Mar 12, 2025
Abstract
In personalized technology and psychological research, precisely detecting
demographic features and personality traits from digital interactions becomes
ever more important. This work investigates implicit categorization, inferring
personality and gender variables directly from linguistic patterns in Telegram
conversation data, while conventional personality prediction techniques mostly
depend on explicitly self-reported labels. We refine a Transformer-based
language model (RoBERTa) to capture complex linguistic cues indicative of
personality traits and gender differences using a dataset comprising 138,866
messages from 1,602 users annotated with MBTI types and 195,016 messages from
2,598 users annotated with gender. Confidence levels help to greatly raise
model accuracy to 86.16\%, hence proving RoBERTa's capacity to consistently
identify implicit personality types from conversational text data. Our results
highlight the usefulness of Transformer topologies for implicit personality and
gender classification, hence stressing their efficiency and stressing important
trade-offs between accuracy and coverage in realistic conversational
environments. With regard to gender classification, the model obtained an
accuracy of 74.4\%, therefore capturing gender-specific language patterns.
Personality dimension analysis showed that people with introverted and
intuitive preferences are especially more active in text-based interactions.
This study emphasizes practical issues in balancing accuracy and data coverage
as Transformer-based models show their efficiency in implicit personality and
gender prediction tasks from conversational texts.