Dependency Parsing of Turkish

The suitability of different parsing methods for different languages is an important topic in syntactic parsing. Especially lesser-studied languages, typologically different from the languages for which methods have originally been developed, pose interesting challenges in this respect. This article...

Full description

Saved in:

Bibliographic Details
Published in	Computational linguistics - Association for Computational Linguistics Vol. 34; no. 3; pp. 357 - 389
Main Authors	Eryiğit, Gülşen, Nivre, Joakim, Oflazer, Kemal
Format	Journal Article
Language	English
Published	One Rogers Street, Cambridge, MA 02142-1209, USA MIT Press 01.09.2008 MIT Press Journals, The The MIT Press
Subjects	Agglutinative Languages Applied linguistics Computational linguistics Computer and Information Sciences Computer Science Computer Generated Language Analysis Computer science Data- och informationsvetenskap Datalogi Datavetenskap Datorlingvistik HUMANIORA och RELIGIONSVETENSKAP HUMANITIES and RELIGION Information technology Informationsteknik Language Language Typology Languages and linguistics Linguistic subjects Linguistics Lingvistikämnen Morphemes Morphology Syntax Relationship Natural Language Processing Parsing Språkvetenskap TECHNOLOGY TEKNIKVETENSKAP Turkish Turkey Computational linguistics
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The suitability of different parsing methods for different languages is an important topic in syntactic parsing. Especially lesser-studied languages, typologically different from the languages for which methods have originally been developed, pose interesting challenges in this respect. This article presents an investigation of data-driven dependency parsing of Turkish, an agglutinative, free constituent order language that can be seen as the representative of a wider class of languages of similar type. Our investigations show that morphological structure plays an essential role in finding syntactic relations in such a language. In particular, we show that employing sublexical units called , rather than word forms, as the basic parsing units improves parsing accuracy. We test our claim on two different parsing methods, one based on a probabilistic model with beam search and the other based on discriminative classifiers and a deterministic parsing strategy, and show that the usefulness of sublexical units holds regardless of the parsing method. We examine the impact of morphological and lexical information in detail and show that, properly used, this kind of information can improve parsing accuracy substantially. Applying the techniques presented in this article, we achieve the highest reported accuracy for parsing the Turkish Treebank.
Bibliography:	September, 2008 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-2 content type line 23 ObjectType-Article-1 ObjectType-Feature-2
ISSN:	0891-2017 1530-9312 1530-9312
DOI:	10.1162/coli.2008.07-017-R1-06-83