Publication: Normalizing Non-canonical Turkish Texts Using Machine Translation Approaches
Loading...
Date
Advisor
Journal Title
Journal ISSN
Volume Title
Publisher
Association for Computational Linguistics (ACL)
Type
Abstract
With the growth of the social web, user-generated text data has reached unprecedented sizes. Non-canonical text normalization provides a way to exploit this as a practical source of training data for language processing systems. The state of the art in Turkish text normalization is composed of a token level pipeline of modules, heavily dependent on external linguistic resources and manually defined rules. Instead, we propose a fully automated, context-aware machine translation approach with fewer stages of processing. Experiments with various implementations of our approach show that we are able to surpass the current best-performing system by a large margin.
Description
Subject
Computer and information sciences, Natural language processing, Turkish, Languages, Text normalization, Computational linguistics, Machine translation