Harnessing Psycho-lingual and Crowd-Sourced Dictionaries for Predicting Taboos in Written Emotional Disclosure in Anonymous Confession Boards.

Journal: Journal of healthcare informatics research
Published Date:

Abstract

There have been many efforts in the last decade in the health informatics community to develop systems that can automatically recognize and predict disclosures on social media. However, a majority of such efforts have focused on simple topic prediction or sentiment classification. However, taboo disclosures on social media that people are not comfortable to talk with their friends represent an abstract theme dependent on context and background. Recent research has demonstrated the efficacy of injecting concept into the learning model to improve prediction. We present a vectorization scheme that combines corpus- and lexicon-based approaches for predicting taboo topics from anonymous social media datasets. The proposed vectorization scheme exploits two context-rich lexicons LIWC and Urban Dictionary. Our methodology achieves cross-validation accuracies of up to 78.1% for the supervised learning task on Facebook Confessions dataset, and 70.5% for the transfer learning task on the YikYak dataset. For both the tasks, supervised algorithms trained with features generated by the proposed vectorizer perform better than vanilla representation. This work presents a novel methodology for predicting taboos from anonymous emotional disclosures on confession boards.

Authors

  • Arindam Paul
    Northwestern University, Evanston, IL 60201 USA.
  • Wei-Keng Liao
    Northwestern University, Evanston, IL 60201 USA.
  • Alok Choudhary
    Northwestern University, Evanston, IL 60201 USA.
  • Ankit Agrawal
    Northwestern University, Evanston, IL 60201 USA.

Keywords

No keywords available for this article.