Part of Speech and Gramset Tagging Algorithms for Unknown Words Based on Morphological Dictionaries of the Veps and Karelian Languages

This research devoted to the low-resource Veps and Karelian languages. Algorithms for assigning part of speech tags to words and grammatical properties to words are presented in the article. These algorithms use our morphological dictionaries, where the lemma, part of speech and a set of grammatical...

Full description

Saved in:
Bibliographic Details
Published inData Analytics and Management in Data Intensive Domains Vol. 1427; pp. 163 - 177
Main Authors Krizhanovsky, Andrew, Krizhanovskaya, Natalia, Novak, Irina
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2021
Springer International Publishing
SeriesCommunications in Computer and Information Science
Subjects
Online AccessGet full text
ISBN9783030811990
3030811999
ISSN1865-0929
1865-0937
DOI10.1007/978-3-030-81200-3_12

Cover

Abstract This research devoted to the low-resource Veps and Karelian languages. Algorithms for assigning part of speech tags to words and grammatical properties to words are presented in the article. These algorithms use our morphological dictionaries, where the lemma, part of speech and a set of grammatical features (gramset) are known for each word form. The algorithms are based on the analogy hypothesis that words with the same suffixes are likely to have the same inflectional models, the same part of speech and gramset. The accuracy of these algorithms were evaluated and compared. 66 thousand Karelian and 313 thousand Vepsian words were used to verify the accuracy of these algorithms. The special functions were designed to assess the quality of results of the developed algorithms. 86.8% of Karelian words and 92.4% of Vepsian words were assigned a correct part of speech by the developed algorithm. 90.7% of Karelian words and 95.3% of Vepsian words were assigned a correct gramset by our algorithm. Morphological and semantic tagging of texts, which are closely related and inseparable in our corpus processes, are described in the paper.
AbstractList This research devoted to the low-resource Veps and Karelian languages. Algorithms for assigning part of speech tags to words and grammatical properties to words are presented in the article. These algorithms use our morphological dictionaries, where the lemma, part of speech and a set of grammatical features (gramset) are known for each word form. The algorithms are based on the analogy hypothesis that words with the same suffixes are likely to have the same inflectional models, the same part of speech and gramset. The accuracy of these algorithms were evaluated and compared. 66 thousand Karelian and 313 thousand Vepsian words were used to verify the accuracy of these algorithms. The special functions were designed to assess the quality of results of the developed algorithms. 86.8% of Karelian words and 92.4% of Vepsian words were assigned a correct part of speech by the developed algorithm. 90.7% of Karelian words and 95.3% of Vepsian words were assigned a correct gramset by our algorithm. Morphological and semantic tagging of texts, which are closely related and inseparable in our corpus processes, are described in the paper.
Author Krizhanovskaya, Natalia
Novak, Irina
Krizhanovsky, Andrew
Author_xml – sequence: 1
  givenname: Andrew
  orcidid: 0000-0003-3717-2079
  surname: Krizhanovsky
  fullname: Krizhanovsky, Andrew
  email: andrew.krizhanovsky@gmail.com
– sequence: 2
  givenname: Natalia
  orcidid: 0000-0002-9948-1910
  surname: Krizhanovskaya
  fullname: Krizhanovskaya, Natalia
– sequence: 3
  givenname: Irina
  orcidid: 0000-0002-9436-9460
  surname: Novak
  fullname: Novak, Irina
BookMark eNo9kNtu1DAQhg0URLfsG3DhFwiM7cSHy1KgIBaBRAuX1qxjJ6GpHexUvAHPjXcXcTEazeH_R_NtyFlM0RPyksErBqBeG6Ub0YCARjMO0AjL-COyEbVzbMjH5Jxp2TVghHpCtnX_NGPGwNn_GTfPyIZxCS0wDe1zsi3lJwBwxVm9c07-fMW80hTot8V7N1KMPb3OeF_8Sm9wGKY40Mt5SHlax_tCQ8r0Nt7F9DvSHyn3hb7B4nuaIv2c8jKmOQ2Tw5m-ndw6pYh58uVgv46efvdLOR74hNnPE0a6wzg84ODLC_I04Fz89l--ILfv391cfWh2X64_Xl3umoV1mjd7g0w61aqe8860WrigWxOENAF7ydsQXC_2He41BnB9H4KHWnoJSgeHnbgg_ORbllxf89nuU7orloE9YLcVoxW2grRHyvaAvYrak2jJ6deDL6v1B5Xzcc04uxGX1edipVQGhLasraFa8RfMiYSs
ContentType Book Chapter
Copyright Springer Nature Switzerland AG 2021
Copyright_xml – notice: Springer Nature Switzerland AG 2021
DBID FFUUA
DEWEY 658.4038
DOI 10.1007/978-3-030-81200-3_12
DatabaseName ProQuest Ebook Central - Book Chapters - Demo use only
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
Business
EISBN 3030812006
9783030812003
EISSN 1865-0937
Editor Thalheim, Bernhard
Sychev, Alexander
Makhortov, Sergey
Editor_xml – sequence: 1
  fullname: Thalheim, Bernhard
– sequence: 2
  fullname: Sychev, Alexander
– sequence: 3
  fullname: Makhortov, Sergey
EndPage 177
ExternalDocumentID EBC6679038_148_174
GroupedDBID 38.
9-X
AABBV
AABLV
ABNDO
ACBPT
ACWLQ
AEJLV
AEKFX
AELOD
AIYYB
ALMA_UNASSIGNED_HOLDINGS
BAHJK
BBABE
CZZ
DBWEY
FFUUA
I4C
IEZ
OCUHQ
ORHYB
SBO
SNUHX
TPJZQ
Z5O
Z7R
Z7U
Z7W
Z7X
Z7Z
Z81
Z83
Z84
Z85
Z87
Z88
ID FETCH-LOGICAL-p1582-b9a16c747d2259483cf849f369fad624ffcd3b5ab8af0cddffe0b5ae6078fca53
ISBN 9783030811990
3030811999
ISSN 1865-0929
IngestDate Tue Jul 29 20:29:20 EDT 2025
Thu May 29 00:53:23 EDT 2025
IsPeerReviewed false
IsScholarly false
LCCallNum QA76.9.D343
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-p1582-b9a16c747d2259483cf849f369fad624ffcd3b5ab8af0cddffe0b5ae6078fca53
OCLC 1260401804
ORCID 0000-0003-3717-2079
0000-0002-9948-1910
0000-0002-9436-9460
PQID EBC6679038_148_174
PageCount 15
ParticipantIDs springer_books_10_1007_978_3_030_81200_3_12
proquest_ebookcentralchapters_6679038_148_174
PublicationCentury 2000
PublicationDate 2021
PublicationDateYYYYMMDD 2021-01-01
PublicationDate_xml – year: 2021
  text: 2021
PublicationDecade 2020
PublicationPlace Switzerland
PublicationPlace_xml – name: Switzerland
– name: Cham
PublicationSeriesTitle Communications in Computer and Information Science
PublicationSeriesTitleAlternate Communic.Comp.Inf.Science
PublicationSubtitle 22nd International Conference, DAMDID/RCDL 2020, Voronezh, Russia, October 13-16, 2020, Selected Proceedings
PublicationTitle Data Analytics and Management in Data Intensive Domains
PublicationYear 2021
Publisher Springer International Publishing AG
Springer International Publishing
Publisher_xml – name: Springer International Publishing AG
– name: Springer International Publishing
RelatedPersons Zhou, Lizhu
Filipe, Joaquim
Ghosh, Ashish
Prates, Raquel Oliveira
RelatedPersons_xml – sequence: 1
  givenname: Joaquim
  orcidid: 0000-0002-5961-6606
  surname: Filipe
  fullname: Filipe, Joaquim
– sequence: 2
  givenname: Ashish
  surname: Ghosh
  fullname: Ghosh, Ashish
– sequence: 3
  givenname: Raquel Oliveira
  orcidid: 0000-0002-7128-4974
  surname: Prates
  fullname: Prates, Raquel Oliveira
– sequence: 4
  givenname: Lizhu
  surname: Zhou
  fullname: Zhou, Lizhu
SSID ssj0002721100
ssj0000580895
ssib054953581
Score 1.6184745
Snippet This research devoted to the low-resource Veps and Karelian languages. Algorithms for assigning part of speech tags to words and grammatical properties to...
SourceID springer
proquest
SourceType Publisher
StartPage 163
SubjectTerms Low-resource language
Morphological analysis
Part of speech tagging
Title Part of Speech and Gramset Tagging Algorithms for Unknown Words Based on Morphological Dictionaries of the Veps and Karelian Languages
URI http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=6679038&ppg=174
http://link.springer.com/10.1007/978-3-030-81200-3_12
Volume 1427
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NbtQwELaW5YI48C-ggHyA0yoo2fzZx6UUStVWHLZVb5Gd2O1WNFklaaX2BXgNnoVn4gEY_-Vn6aUcNtp4rST2fDuemcw3Rug9QBagI5nHU154qiCXx2OeelKQIIT13xe6TsHBYbJ7FO2dxCeTyZ9B1tJlyz_mN7fySv5HqtAGclUs2TtItrsoNMB3kC8cQcJw3DB-x2FWk8vCWmZqinSVlvtcFhXG0B36FPXP1QVblf27m3p1c8bK6spGTweZjeNf2TUzaljVSuyU-GF1xbQi_VbbDbgd8L4DGDUVZi1EbnhzX2t20Yh2tmSnek-kxY_Tql61Z6YYBNi9KrJXqsB90cw-wbpazMAi3Z5_WPgHFQBhoKA1DUP79y674ViszegVr00HbfZtDNaMVUlCNGMmTGPIjmZDC5sS3dE4nb4bBkTmwUZAxAVEN0KqfVRv5EErpJIgoGbPUrsIkCT2fGojMWLYZgrUWGUfWNVs7IbAbEfzz5I0zEKBm3lgUfmw9GVqa-x7KYmm6P5iZ2__2GnBWGX8uqJ0pkI98YllT5_rV8XKaddMX_egirTkBkJNWal-YAPC6G1PMXKtNrIBtJG1fIweKuINVowYmNInaCLKp-iRkxK2UnmGfiqE4UpigzAM4sMWYdgiDPcIwyBXbBGGNcKwRhiuyt-_RujCQ3Sp6wO6sEKXvoNDF-7Q9RwdfdlZbu96dj8Rbx3E4EhyyoIkB_-5gEWMRiTMJYmoDBMqWZHMIynzIuQx44RJPy8KKYUPpyIBM1rmLA5foGlZleIlwiwkHFok5fE8KhJCiSgkWMqJIIKGcfQKeW5SM531YFOtczOFTZYkKfVDAp43fFLoP3Mzn6nuTebKiYPIsjADkWVaZJkS2es79d5CD_q_yRs0betL8RYs6Za_s9D7C4J8xqg
linkProvider Library Specific Holdings
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Data+Analytics+and+Management+in+Data+Intensive+Domains&rft.au=Krizhanovsky%2C+Andrew&rft.au=Krizhanovskaya%2C+Natalia&rft.au=Novak%2C+Irina&rft.atitle=Part+of+Speech+and+Gramset+Tagging+Algorithms+for+Unknown+Words+Based+on%C2%A0Morphological+Dictionaries+of+the+Veps+and+Karelian+Languages&rft.series=Communications+in+Computer+and+Information+Science&rft.date=2021-01-01&rft.pub=Springer+International+Publishing&rft.isbn=9783030811990&rft.issn=1865-0929&rft.eissn=1865-0937&rft.spage=163&rft.epage=177&rft_id=info:doi/10.1007%2F978-3-030-81200-3_12
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F6679038-l.jpg