Improving Feature Representation Based on a Neural Network for Author Profiling in Social Media Texts.

Journal: Computational intelligence and neuroscience
Published Date:

Abstract

We introduce a lexical resource for preprocessing social media data. We show that a neural network-based feature representation is enhanced by using this resource. We conducted experiments on the PAN 2015 and PAN 2016 author profiling corpora and obtained better results when performing the data preprocessing using the developed lexical resource. The resource includes dictionaries of slang words, contractions, abbreviations, and emoticons commonly used in social media. Each of the dictionaries was built for the English, Spanish, Dutch, and Italian languages. The resource is freely available.

Authors

  • Helena Gómez-Adorno
    Instituto Politécnico Nacional (IPN), Centro de Invetigación en Computación (CIC), Mexico City, Mexico.
  • Ilia Markov
    Instituto Politécnico Nacional (IPN), Centro de Invetigación en Computación (CIC), Mexico City, Mexico.
  • Grigori Sidorov
    Instituto Politécnico Nacional (IPN), Centro de Invetigación en Computación (CIC), Mexico City, Mexico.
  • Juan-Pablo Posadas-Durán
    Instituto Politécnico Nacional (IPN), Escuela Superior de Ingeniería Mecánica y Eléctrica Unidad Zacatenco (ESIME-Zacatenco), Mexico City, Mexico.
  • Miguel A Sanchez-Perez
    Instituto Politécnico Nacional (IPN), Centro de Invetigación en Computación (CIC), Mexico City, Mexico.
  • Liliana Chanona-Hernandez
    Instituto Politécnico Nacional (IPN), Escuela Superior de Ingeniería Mecánica y Eléctrica Unidad Zacatenco (ESIME-Zacatenco), Mexico City, Mexico.