Publication:
Normalizing Non-canonical Turkish Texts Using Machine Translation Approaches

Loading...
Thumbnail Image

Advisor

Journal Title

Journal ISSN

Volume Title

Publisher

Association for Computational Linguistics (ACL)

Research Projects

Organizational Units

Journal Issue

Abstract

With the growth of the social web, user-generated text data has reached unprecedented sizes. Non-canonical text normalization provides a way to exploit this as a practical source of training data for language processing systems. The state of the art in Turkish text normalization is composed of a token level pipeline of modules, heavily dependent on external linguistic resources and manually defined rules. Instead, we propose a fully automated, context-aware machine translation approach with fewer stages of processing. Experiments with various implementations of our approach show that we are able to surpass the current best-performing system by a large margin.

Description

Subject

Computer and information sciences, Natural language processing, Turkish, Languages, Text normalization, Computational linguistics, Machine translation

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By

Related Goal

0

Views

0

Downloads
View PlumX Details