Quantitative Data on POS Distribution in the Beginnings and the Ends of Utterances in Everyday Russian Speech

The paper presents statistical data on POS distribution in the beginnings and the ends of everyday Russian utterances. The material for this study was a morphologically annotated subcorpus of the ORD corpus of spoken Russian with volume of 149737 tokens and containing fragments of everyday speech of...

Full description

Saved in:
Bibliographic Details
Published inSpeech and Computer Vol. 11096; pp. 596 - 605
Main Author Sherstinova, Tatiana
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 01.01.2018
Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text
ISBN3319995782
9783319995786
ISSN0302-9743
1611-3349
DOI10.1007/978-3-319-99579-3_61

Cover

Abstract The paper presents statistical data on POS distribution in the beginnings and the ends of everyday Russian utterances. The material for this study was a morphologically annotated subcorpus of the ORD corpus of spoken Russian with volume of 149737 tokens and containing fragments of everyday speech of 213 people of different gender, age, and professional groups. In the proposed study, the method of n-gram analysis, which is typically employed in computational linguistics to construct probabilistic language models, was used. In the subcorpus as a whole, the most frequent POS turned out to be verbs (17.23%), personal pronouns (15.60%), nouns (14%), particles (13%), and conjunctions (9%). However, in the initial position of spoken utterances the most frequent POS are particles (19.99%) and conjunctions (12%), and in the final position of utterances the verbs and nouns are used more often than others. The former are more typical for interrogative (27.66%) and narrative (25.42%) utterances, and the latter are frequently used in exclamative (29.95%) and narrative (24.28%) utterances. Besides, the most typical bigrams and trigrams in the beginning of utterances started with a particle and their probabilities are presented. A high percentage of syntactic models containing particles in the initial position of utterances leads us to the assumption that these units have special pragmatic functions, associated with marking phrase boundaries. Statistical data obtained here may be used for modeling of everyday utterances for the variety of dialogue systems and for improvement of Russian speech recognition systems.
AbstractList The paper presents statistical data on POS distribution in the beginnings and the ends of everyday Russian utterances. The material for this study was a morphologically annotated subcorpus of the ORD corpus of spoken Russian with volume of 149737 tokens and containing fragments of everyday speech of 213 people of different gender, age, and professional groups. In the proposed study, the method of n-gram analysis, which is typically employed in computational linguistics to construct probabilistic language models, was used. In the subcorpus as a whole, the most frequent POS turned out to be verbs (17.23%), personal pronouns (15.60%), nouns (14%), particles (13%), and conjunctions (9%). However, in the initial position of spoken utterances the most frequent POS are particles (19.99%) and conjunctions (12%), and in the final position of utterances the verbs and nouns are used more often than others. The former are more typical for interrogative (27.66%) and narrative (25.42%) utterances, and the latter are frequently used in exclamative (29.95%) and narrative (24.28%) utterances. Besides, the most typical bigrams and trigrams in the beginning of utterances started with a particle and their probabilities are presented. A high percentage of syntactic models containing particles in the initial position of utterances leads us to the assumption that these units have special pragmatic functions, associated with marking phrase boundaries. Statistical data obtained here may be used for modeling of everyday utterances for the variety of dialogue systems and for improvement of Russian speech recognition systems.
Author Sherstinova, Tatiana
Author_xml – sequence: 1
  givenname: Tatiana
  orcidid: 0000-0002-9085-3378
  surname: Sherstinova
  fullname: Sherstinova, Tatiana
  email: t.sherstinova@spbu.ru
  organization: National Research University Higher School of Economiсs, St. Petersburg, Russia
BookMark eNo1kNtOGzEQhk0LFQnkDXrhFzCM7V0fLlsIBwmJlsO15axnE7fUm64dJN4eb2ivRvpm_tHMNyeHaUhIyFcOZxxAn1ttmGSSW2Ztqy2TTvEDsqhYVrhn8hOZccU5k7Kxn8n8f8OIQzIDCYJZ3cgvZM6hFYoLDfqYLHL-BQACjDUKZuTPz51PJRZf4ivSS188HRL9cf9IL2MuY1ztSqwgJlo2SL_jOqYU0zpTn8IeLVPIdOjpcyk4-tRhnoaXrzi-Bf9GH3Y5R5_o4xax25ySo96_ZFz8qyfk-Wr5dHHD7u6vby--3bG1sKqwXtgeQ6dNYzS0vu0DAErspw-DVkHqru_BoGgUgFfBVBt13nSgVWtXQZ4Q8bE3b8d6LY5uNQy_s-PgJruuanTS1W1ub9JNdmuo-Qhtx-HvDnNxOKU6TGX0L93Gb-uH2SkJUtvW6aatKSHfAQMMex4
ContentType Book Chapter
Copyright Springer Nature Switzerland AG 2018
Copyright_xml – notice: Springer Nature Switzerland AG 2018
DBID FFUUA
DEWEY 006.35
DOI 10.1007/978-3-319-99579-3_61
DatabaseName ProQuest Ebook Central - Book Chapters - Demo use only
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISBN 9783319995793
3319995790
EISSN 1611-3349
Editor Potapova, Rodmonga
Karpov, Alexey
Jokisch, Oliver
Editor_xml – sequence: 1
  fullname: Karpov, Alexey
– sequence: 2
  fullname: Jokisch, Oliver
– sequence: 3
  fullname: Potapova, Rodmonga
EndPage 605
ExternalDocumentID EBC6303795_745_612
GroupedDBID 0D6
0DA
38.
AABBV
ACOUV
AEDXK
AEJLV
AEKFX
AEZAY
ALMA_UNASSIGNED_HOLDINGS
ANXHU
BBABE
BICGV
BJAWL
BUBNW
CVGDX
CZZ
EDOXC
FFUUA
FOYMO
I4C
IEZ
NQNQZ
OEBZI
SBO
TPJZQ
TSXQS
Z5O
Z7R
Z7S
Z7U
Z7V
Z7W
Z7X
Z7Y
Z7Z
Z81
Z82
Z83
Z84
Z85
Z87
Z88
-DT
-~X
29L
2HA
2HV
ACGFS
ADCXD
EJD
F5P
LAS
LDH
P2P
RSU
~02
ID FETCH-LOGICAL-g296t-f29fedc7848705a5fd00e3ef3319d76d37cff08e24600a6d8579edc8c07659bd3
ISBN 3319995782
9783319995786
ISSN 0302-9743
IngestDate Tue Jul 29 20:12:42 EDT 2025
Thu May 29 00:43:52 EDT 2025
IsPeerReviewed true
IsScholarly true
LCCallNum Q334-342
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-g296t-f29fedc7848705a5fd00e3ef3319d76d37cff08e24600a6d8579edc8c07659bd3
OCLC 1052612707
ORCID 0000-0002-9085-3378
PQID EBC6303795_745_612
PageCount 10
ParticipantIDs springer_books_10_1007_978_3_319_99579_3_61
proquest_ebookcentralchapters_6303795_745_612
PublicationCentury 2000
PublicationDate 2018-01-01
PublicationDateYYYYMMDD 2018-01-01
PublicationDate_xml – month: 01
  year: 2018
  text: 2018-01-01
  day: 01
PublicationDecade 2010
PublicationPlace Switzerland
PublicationPlace_xml – name: Switzerland
– name: Cham
PublicationSeriesSubtitle Lecture Notes in Artificial Intelligence
PublicationSeriesTitle Lecture Notes in Computer Science
PublicationSeriesTitleAlternate Lect.Notes Computer
PublicationSubtitle 20th International Conference, SPECOM 2018, Leipzig, Germany, September 18-22, 2018, Proceedings
PublicationTitle Speech and Computer
PublicationYear 2018
Publisher Springer International Publishing AG
Springer International Publishing
Publisher_xml – name: Springer International Publishing AG
– name: Springer International Publishing
RelatedPersons Kleinberg, Jon M.
Mattern, Friedemann
Naor, Moni
Mitchell, John C.
Terzopoulos, Demetri
Steffen, Bernhard
Pandu Rangan, C.
Kanade, Takeo
Kittler, Josef
Weikum, Gerhard
Hutchison, David
Tygar, Doug
RelatedPersons_xml – sequence: 1
  givenname: David
  surname: Hutchison
  fullname: Hutchison, David
  organization: Lancaster University, Lancaster, United Kingdom
– sequence: 2
  givenname: Takeo
  surname: Kanade
  fullname: Kanade, Takeo
  organization: Carnegie Mellon University, Pittsburgh, USA
– sequence: 3
  givenname: Josef
  surname: Kittler
  fullname: Kittler, Josef
  organization: University of Surrey, Guildford, United Kingdom
– sequence: 4
  givenname: Jon M.
  surname: Kleinberg
  fullname: Kleinberg, Jon M.
  organization: Cornell University, Ithaca, USA
– sequence: 5
  givenname: Friedemann
  surname: Mattern
  fullname: Mattern, Friedemann
  organization: ETH Zurich, Zurich, Switzerland
– sequence: 6
  givenname: John C.
  surname: Mitchell
  fullname: Mitchell, John C.
  organization: Stanford University, Stanford, USA
– sequence: 7
  givenname: Moni
  surname: Naor
  fullname: Naor, Moni
  organization: Dept Applied Math & Computer Science, Weizmann Institute of Science, Rehovot, Israel
– sequence: 8
  givenname: C.
  surname: Pandu Rangan
  fullname: Pandu Rangan, C.
  organization: Indian Institute of Technology Madras, Chennai, India
– sequence: 9
  givenname: Bernhard
  surname: Steffen
  fullname: Steffen, Bernhard
  organization: TU Dortmund University, Dortmund, Germany
– sequence: 10
  givenname: Demetri
  surname: Terzopoulos
  fullname: Terzopoulos, Demetri
  organization: University of California, Los Angeles, USA
– sequence: 11
  givenname: Doug
  surname: Tygar
  fullname: Tygar, Doug
  organization: University of California, Berkeley, USA
– sequence: 12
  givenname: Gerhard
  surname: Weikum
  fullname: Weikum, Gerhard
  organization: Max Planck Institute for Informatics, Saarbrücken, Germany
SSID ssj0002089860
ssj0002792
Score 2.097913
Snippet The paper presents statistical data on POS distribution in the beginnings and the ends of everyday Russian utterances. The material for this study was a...
SourceID springer
proquest
SourceType Publisher
StartPage 596
SubjectTerms Corpus linguistics
Everyday speech
N-gram analysis
Parts of speech
Pragmatic Markers
Probability
Russian
Syntax
Title Quantitative Data on POS Distribution in the Beginnings and the Ends of Utterances in Everyday Russian Speech
URI http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=6303795&ppg=612
http://link.springer.com/10.1007/978-3-319-99579-3_61
Volume 11096
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnZ1Nb9QwEIYtWC7AASggypd84IaMvHFsJ8e2ClQV33RRb5bj2IgDWcRmD_DrmXHi3ST0Ui7RKkoiax6vPZnMO0PIC53XmS5kw7h0nOW29MwGwZkFZxl2wFoWUeX67r06XeVnF_IiNdoe1CVd_cr9uVRX8j9U4RxwRZXsFcjuHgon4DfwhSMQhuPM-Z2GWXsNx0_vXa9LS60Zxvw_bW0bBWQxNch2Fj8LfPzwBett7rpcpSzHY2zP0MYOnimjssJUWfAkVyj3wakRE2crsNFvlI583m6i_LIfxTh2sCxmsYMUO5xFH0cBsKM3k_dNIbBqAfzJ1WQBXfK-K-0_y_E4AwPVUnhvyYTp669Pq1-rIZ16Wv26Oj5RsMvqUhqdS7gRttvrusgX5MZRdfb26y6elvGiLBRH-U4aZNYXWNoPeiSdvGxMk5eM2Xfx6G6c3yW3UYJCURsCo7xHrvn2gNxJlOmwHh-QW6NqkvfJjzFyisjpuqWAnI6R0-8tBb50j5wC8ngKkdN1oHvkeHFCTgfktEf-gKxeV-cnp2zopsG-ZaXqWMjK4BsHtoMlWloZGs698AHN02jVCO1C4IXPcvCBrWoKMAtcXziulSzrRjwki3bd-keECtEovXTa8-DyAGtA5pQFV1QFUda5tIeEJUOa-M1_SDR2vdk2Zob0kLxM1jZ4-cakYtqAyQgDIzQRk0FMj6_49Cfk5n7uPyWL7tfWPwNPsqufD5PoL4bacHo
linkProvider Library Specific Holdings
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Speech+and+Computer&rft.atitle=Quantitative+Data+on+POS+Distribution+in+the+Beginnings+and+the+Ends+of+Utterances+in+Everyday+Russian+Speech&rft.date=2018-01-01&rft.pub=Springer+International+Publishing+AG&rft.isbn=9783319995786&rft.volume=11096&rft_id=info:doi/10.1007%2F978-3-319-99579-3_61&rft.externalDBID=612&rft.externalDocID=EBC6303795_745_612
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F6303795-l.jpg