Stylomech: Unveiling Authorship via Computational Stylometry in English and Romanized Sinhala
Journal:
arXiv
Published Date:
Jan 16, 2025
Abstract
With the advent of Web 2.0, the development in social technology coupled with
global communication systematically brought positive and negative impacts to
society. Copyright claims and Author identification are deemed crucial as there
has been a considerable amount of increase in content violation owing to the
lack of proper ethics in society. The Author's attribution in both English and
Romanized Sinhala became a major requirement in the last few decades. As an
area largely unexplored, particularly within the context of Romanized Sinhala,
the research contributes significantly to the field of computational
linguistics. The proposed author attribution system offers a unique approach,
allowing for the comparison of only two sets of text: suspect author and
anonymous text, a departure from traditional methodologies which often rely on
larger corpora. This work focuses on using the numerical representation of
various pairs of the same and different authors allowing for, the model to
train on these representations as opposed to text, this allows for it to apply
to a multitude of authors and contexts, given that the suspected author text,
and the anonymous text are of reasonable quality. By expanding the scope of
authorship attribution to encompass diverse linguistic contexts, the work
contributes to fostering trust and accountability in digital communication,
especially in Sri Lanka. This research presents a pioneering approach to author
attribution in both English and Romanized Sinhala, addressing a critical need
for content verification and intellectual property rights enforcement in the
digital age.