POCASUM : Policy Categorizer and Summarizer Based on Text Mining and Machine Learning.

Journal: Soft computing
Published Date:

Abstract

Having control over your data is a right and a duty that every citizen has in our digital society. It is often that users skip entire policies of applications or websites to save time and energy without realizing the potential sticky points in these policies. Due to obscure language and verbose explanations majority of users of hypermedia do not bother to read them. Further, sometimes digital media companies do not spend enough effort in stating their policies clearly which often time can also be incomplete. A summarized version of these privacy policies that can be categorized into the useful information can help the users. To solve this problem, in this work we propose to use machine learning based models for policy categorizer that classifies the policy paragraphs under the attributes proposed like security, contact etc. By benchmarking different machine learning based classifier models, we show that artificial neural network model performs with higher accuracy on a challenging dataset of textual privacy policies. We thus show that machine learning can help summarize the relevant paragraphs under the various attributes so that the user can get the gist of that topic within a few lines.

Authors

  • Rushikesh Deotale
    School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India.
  • Shreyash Rawat
    School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India.
  • V Vijayarajan
    School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India.
  • V B Surya Prasath
    Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati OH 45229 USA. Departments of Pediatrics, Biomedical Informatics, Electrical Engineering and Computer Science, University of Cincinnati College of Medicine, Cincinnati, OH USA.

Keywords

No keywords available for this article.