Tailoring the automated construction of large-scale taxonomies using the web

It has long been a dream to have available a single, centralized, semantic thesaurus or terminology taxonomy to support research in a variety of fields. Much human and computational effort has gone into constructing such resources, including the original WordNet and subsequent wordnets in various la...

Full description

Saved in:
Bibliographic Details
Published inLanguage Resources and Evaluation Vol. 47; no. 3; pp. 859 - 890
Main Authors Kozareva, Zornitsa, Hovy, Eduard
Format Journal Article
LanguageEnglish
Published Dordrecht Springer 01.09.2013
Springer Netherlands
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:It has long been a dream to have available a single, centralized, semantic thesaurus or terminology taxonomy to support research in a variety of fields. Much human and computational effort has gone into constructing such resources, including the original WordNet and subsequent wordnets in various languages. To produce such resources one has to overcome well-known problems in achieving both wide coverage and internal consistency within a single wordnet and across many wordnets. In particular, one has to ensure that alternative valid taxonomizations covering the same basic terms are recognized and treated appropriately. In this paper we describe a pipeline of new, powerful, minimally supervised, automated algorithms that can be used to construct terminology taxonomies and wordnets, in various languages, by harvesting large amounts of online domain-specific or general text. We illustrate the effectiveness of the algorithms both to build localized, domain-specific wordnets and to highlight and investigate certain deeper ontological problems such as parallel generalization hierarchies. We show shortcomings and gaps in the manually-constructed English WordNet in various domains.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ObjectType-Article-1
ObjectType-Feature-2
ISSN:1574-020X
1572-8412
1574-0218
DOI:10.1007/s10579-013-9229-0