Part of speech and gramset tagging algorithms for unknown words based on morphological dictionaries of the Veps and Karelian languages

This research devoted to the low-resource Veps and Karelian languages. Algorithms for assigning part of speech tags to words and grammatical properties to words are presented in the article. These algorithms use our morphological dictionaries, where the lemma, part of speech and a set of grammatical...

Full description

Saved in:
Bibliographic Details
Main Authors Krizhanovsky, Andrew, Krizhanovsky, Natalia, Novak, Irina
Format Journal Article
LanguageEnglish
Published 22.03.2021
Subjects
Online AccessGet full text

Cover

Loading…
Abstract This research devoted to the low-resource Veps and Karelian languages. Algorithms for assigning part of speech tags to words and grammatical properties to words are presented in the article. These algorithms use our morphological dictionaries, where the lemma, part of speech and a set of grammatical features (gramset) are known for each word form. The algorithms are based on the analogy hypothesis that words with the same suffixes are likely to have the same inflectional models, the same part of speech and gramset. The accuracy of these algorithms were evaluated and compared. 313 thousand Vepsian and 66 thousand Karelian words were used to verify the accuracy of these algorithms. The special functions were designed to assess the quality of results of the developed algorithms. 92.4% of Vepsian words and 86.8% of Karelian words were assigned a correct part of speech by the developed algorithm. 95.3% of Vepsian words and 90.7% of Karelian words were assigned a correct gramset by our algorithm. Morphological and semantic tagging of texts, which are closely related and inseparable in our corpus processes, are described in the paper.
AbstractList This research devoted to the low-resource Veps and Karelian languages. Algorithms for assigning part of speech tags to words and grammatical properties to words are presented in the article. These algorithms use our morphological dictionaries, where the lemma, part of speech and a set of grammatical features (gramset) are known for each word form. The algorithms are based on the analogy hypothesis that words with the same suffixes are likely to have the same inflectional models, the same part of speech and gramset. The accuracy of these algorithms were evaluated and compared. 313 thousand Vepsian and 66 thousand Karelian words were used to verify the accuracy of these algorithms. The special functions were designed to assess the quality of results of the developed algorithms. 92.4% of Vepsian words and 86.8% of Karelian words were assigned a correct part of speech by the developed algorithm. 95.3% of Vepsian words and 90.7% of Karelian words were assigned a correct gramset by our algorithm. Morphological and semantic tagging of texts, which are closely related and inseparable in our corpus processes, are described in the paper.
Author Krizhanovsky, Natalia
Novak, Irina
Krizhanovsky, Andrew
Author_xml – sequence: 1
  givenname: Andrew
  surname: Krizhanovsky
  fullname: Krizhanovsky, Andrew
– sequence: 2
  givenname: Natalia
  surname: Krizhanovsky
  fullname: Krizhanovsky, Natalia
– sequence: 3
  givenname: Irina
  surname: Novak
  fullname: Novak, Irina
BackLink https://doi.org/10.48550/arXiv.2103.11859$$DView paper in arXiv
BookMark eNotkDtOxDAURV1AAQMLoOJtIEMc5-cSjfiJkaAY0UYv8bNjkdiRnWFgA6wbJlDd7lydc85OnHfE2BVP13ldFOkNhk_7sc54Ktac14U8Y9-vGGbwGuJE1PWAToEJOEaaYUZjrDOAg_HBzv0YQfsAe_fu_MHBwQcVocVICryD0Yep94M3tsMBlO1m6x0GS_GIn3uCN5ricvCMgQaLDgZ0Zo-G4gU71ThEuvzfFdvd3-02j8n25eFpc7tNsKxkUvK0zKjjHW9_FTSiUFpiWstclhWnWhW5IJ7yEqVWUpaaV20tciG4Fm0mhVix6z_sEqKZgh0xfDXHIM0SRPwAL_pfJg
ContentType Journal Article
Copyright http://creativecommons.org/publicdomain/zero/1.0
Copyright_xml – notice: http://creativecommons.org/publicdomain/zero/1.0
DBID AKY
GOX
DOI 10.48550/arxiv.2103.11859
DatabaseName arXiv Computer Science
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2103_11859
GroupedDBID AKY
GOX
ID FETCH-LOGICAL-a679-61062ec1c1b103faa3df9a08949671e8d543e1016a9fd996f17b834331f3b2933
IEDL.DBID GOX
IngestDate Mon Jan 08 05:49:47 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a679-61062ec1c1b103faa3df9a08949671e8d543e1016a9fd996f17b834331f3b2933
OpenAccessLink https://arxiv.org/abs/2103.11859
ParticipantIDs arxiv_primary_2103_11859
PublicationCentury 2000
PublicationDate 2021-03-22
PublicationDateYYYYMMDD 2021-03-22
PublicationDate_xml – month: 03
  year: 2021
  text: 2021-03-22
  day: 22
PublicationDecade 2020
PublicationYear 2021
Score 1.7986304
SecondaryResourceType preprint
Snippet This research devoted to the low-resource Veps and Karelian languages. Algorithms for assigning part of speech tags to words and grammatical properties to...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Computation and Language
Computer Science - Information Retrieval
Title Part of speech and gramset tagging algorithms for unknown words based on morphological dictionaries of the Veps and Karelian languages
URI https://arxiv.org/abs/2103.11859
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1LT8MwDI62nbggEKDxlA9cK5I-0vQ4IcaExOMw0G5Tntsk1k5rB_wCfjdOugkuXNPIUe029uf4cwi5zhzLnHBpJIzJohQ_4Uhq4yIltdNUckOVJwo_PvHRa_owySYdAjsujFx_LT7a_sCqvkE8kuBPLbKiS7px7Eu27p8n7eFkaMW1nf87D2PMMPTHSQwPyP42uoNBa45D0rHlEfl-QfNA5aBeWavngNgdfE1UbRtopM_4zkC-zyqE6fNlDRhFwqb0ua4SPhEa1uA9jYGqhGWFWtntVmAWgZMQwK4Xj6EcvNlVHRbwJC-fwYBdQrI-JuPh3fh2FG2vP4gkzwvEdJTHVjPNFL6dkzIxrpBUFGnBc2aFydLEeuwtC2cQtTiWK5F4ApRLFDrx5IT0yqq0fQIqU5w5qrW2JpXcCZRDmeHc2lg4k5-SflDadNV2uJh6fU6DPs_-f3RO9mJf4EGTKI4vSK9Zb-wleuhGXQUz_QArbJTY
link.rule.ids 228,230,783,888
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Part+of+speech+and+gramset+tagging+algorithms+for+unknown+words+based+on+morphological+dictionaries+of+the+Veps+and+Karelian+languages&rft.au=Krizhanovsky%2C+Andrew&rft.au=Krizhanovsky%2C+Natalia&rft.au=Novak%2C+Irina&rft.date=2021-03-22&rft_id=info:doi/10.48550%2Farxiv.2103.11859&rft.externalDocID=2103_11859