Quantitative Data on POS Distribution in the Beginnings and the Ends of Utterances in Everyday Russian Speech

The paper presents statistical data on POS distribution in the beginnings and the ends of everyday Russian utterances. The material for this study was a morphologically annotated subcorpus of the ORD corpus of spoken Russian with volume of 149737 tokens and containing fragments of everyday speech of...

Full description

Saved in:

Bibliographic Details
Published in	Speech and Computer Vol. 11096; pp. 596 - 605
Main Author	Sherstinova, Tatiana
Format	Book Chapter
Language	English
Published	Switzerland Springer International Publishing AG 01.01.2018 Springer International Publishing
Series	Lecture Notes in Computer Science
Subjects	Corpus linguistics Everyday speech N-gram analysis Parts of speech Pragmatic Markers Probability Russian Syntax
Online Access	Get full text
ISBN	3319995782 9783319995786
ISSN	0302-9743 1611-3349
DOI	10.1007/978-3-319-99579-3_61

Cover

Abstract	The paper presents statistical data on POS distribution in the beginnings and the ends of everyday Russian utterances. The material for this study was a morphologically annotated subcorpus of the ORD corpus of spoken Russian with volume of 149737 tokens and containing fragments of everyday speech of 213 people of different gender, age, and professional groups. In the proposed study, the method of n-gram analysis, which is typically employed in computational linguistics to construct probabilistic language models, was used. In the subcorpus as a whole, the most frequent POS turned out to be verbs (17.23%), personal pronouns (15.60%), nouns (14%), particles (13%), and conjunctions (9%). However, in the initial position of spoken utterances the most frequent POS are particles (19.99%) and conjunctions (12%), and in the final position of utterances the verbs and nouns are used more often than others. The former are more typical for interrogative (27.66%) and narrative (25.42%) utterances, and the latter are frequently used in exclamative (29.95%) and narrative (24.28%) utterances. Besides, the most typical bigrams and trigrams in the beginning of utterances started with a particle and their probabilities are presented. A high percentage of syntactic models containing particles in the initial position of utterances leads us to the assumption that these units have special pragmatic functions, associated with marking phrase boundaries. Statistical data obtained here may be used for modeling of everyday utterances for the variety of dialogue systems and for improvement of Russian speech recognition systems.
AbstractList	The paper presents statistical data on POS distribution in the beginnings and the ends of everyday Russian utterances. The material for this study was a morphologically annotated subcorpus of the ORD corpus of spoken Russian with volume of 149737 tokens and containing fragments of everyday speech of 213 people of different gender, age, and professional groups. In the proposed study, the method of n-gram analysis, which is typically employed in computational linguistics to construct probabilistic language models, was used. In the subcorpus as a whole, the most frequent POS turned out to be verbs (17.23%), personal pronouns (15.60%), nouns (14%), particles (13%), and conjunctions (9%). However, in the initial position of spoken utterances the most frequent POS are particles (19.99%) and conjunctions (12%), and in the final position of utterances the verbs and nouns are used more often than others. The former are more typical for interrogative (27.66%) and narrative (25.42%) utterances, and the latter are frequently used in exclamative (29.95%) and narrative (24.28%) utterances. Besides, the most typical bigrams and trigrams in the beginning of utterances started with a particle and their probabilities are presented. A high percentage of syntactic models containing particles in the initial position of utterances leads us to the assumption that these units have special pragmatic functions, associated with marking phrase boundaries. Statistical data obtained here may be used for modeling of everyday utterances for the variety of dialogue systems and for improvement of Russian speech recognition systems.
Author	Sherstinova, Tatiana
Author_xml	– sequence: 1 givenname: Tatiana orcidid: 0000-0002-9085-3378 surname: Sherstinova fullname: Sherstinova, Tatiana email: t.sherstinova@spbu.ru organization: National Research University Higher School of Economiсs, St. Petersburg, Russia
BookMark	eNo1kNtOGzEQhk0LFQnkDXrhFzCM7V0fLlsIBwmJlsO15axnE7fUm64dJN4eb2ivRvpm_tHMNyeHaUhIyFcOZxxAn1ttmGSSW2Ztqy2TTvEDsqhYVrhn8hOZccU5k7Kxn8n8f8OIQzIDCYJZ3cgvZM6hFYoLDfqYLHL-BQACjDUKZuTPz51PJRZf4ivSS188HRL9cf9IL2MuY1ztSqwgJlo2SL_jOqYU0zpTn8IeLVPIdOjpcyk4-tRhnoaXrzi-Bf9GH3Y5R5_o4xax25ySo96_ZFz8qyfk-Wr5dHHD7u6vby--3bG1sKqwXtgeQ6dNYzS0vu0DAErspw-DVkHqru_BoGgUgFfBVBt13nSgVWtXQZ4Q8bE3b8d6LY5uNQy_s-PgJruuanTS1W1ub9JNdmuo-Qhtx-HvDnNxOKU6TGX0L93Gb-uH2SkJUtvW6aatKSHfAQMMex4
ContentType	Book Chapter
Copyright	Springer Nature Switzerland AG 2018
Copyright_xml	– notice: Springer Nature Switzerland AG 2018
DBID	FFUUA
DEWEY	006.35
DOI	10.1007/978-3-319-99579-3_61
DatabaseName	ProQuest Ebook Central - Book Chapters - Demo use only
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering Computer Science
EISBN	9783319995793 3319995790
EISSN	1611-3349
Editor	Potapova, Rodmonga Karpov, Alexey Jokisch, Oliver
Editor_xml	– sequence: 1 fullname: Karpov, Alexey – sequence: 2 fullname: Jokisch, Oliver – sequence: 3 fullname: Potapova, Rodmonga
EndPage	605
ExternalDocumentID	EBC6303795_745_612
GroupedDBID	0D6 0DA 38. AABBV ACOUV AEDXK AEJLV AEKFX AEZAY ALMA_UNASSIGNED_HOLDINGS ANXHU BBABE BICGV BJAWL BUBNW CVGDX CZZ EDOXC FFUUA FOYMO I4C IEZ NQNQZ OEBZI SBO TPJZQ TSXQS Z5O Z7R Z7S Z7U Z7V Z7W Z7X Z7Y Z7Z Z81 Z82 Z83 Z84 Z85 Z87 Z88 -DT -~X 29L 2HA 2HV ACGFS ADCXD EJD F5P LAS LDH P2P RSU ~02
ID	FETCH-LOGICAL-g296t-f29fedc7848705a5fd00e3ef3319d76d37cff08e24600a6d8579edc8c07659bd3
ISBN	3319995782 9783319995786
ISSN	0302-9743
IngestDate	Tue Jul 29 20:12:42 EDT 2025 Thu May 29 00:43:52 EDT 2025
IsPeerReviewed	true
IsScholarly	true
LCCallNum	Q334-342
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-g296t-f29fedc7848705a5fd00e3ef3319d76d37cff08e24600a6d8579edc8c07659bd3
OCLC	1052612707
ORCID	0000-0002-9085-3378
PQID	EBC6303795_745_612
PageCount	10
ParticipantIDs	springer_books_10_1007_978_3_319_99579_3_61 proquest_ebookcentralchapters_6303795_745_612
PublicationCentury	2000
PublicationDate	2018-01-01
PublicationDateYYYYMMDD	2018-01-01
PublicationDate_xml	– month: 01 year: 2018 text: 2018-01-01 day: 01
PublicationDecade	2010
PublicationPlace	Switzerland
PublicationPlace_xml	– name: Switzerland – name: Cham
PublicationSeriesSubtitle	Lecture Notes in Artificial Intelligence
PublicationSeriesTitle	Lecture Notes in Computer Science
PublicationSeriesTitleAlternate	Lect.Notes Computer
PublicationSubtitle	20th International Conference, SPECOM 2018, Leipzig, Germany, September 18-22, 2018, Proceedings
PublicationTitle	Speech and Computer
PublicationYear	2018
Publisher	Springer International Publishing AG Springer International Publishing
Publisher_xml	– name: Springer International Publishing AG – name: Springer International Publishing
RelatedPersons	Kleinberg, Jon M. Mattern, Friedemann Naor, Moni Mitchell, John C. Terzopoulos, Demetri Steffen, Bernhard Pandu Rangan, C. Kanade, Takeo Kittler, Josef Weikum, Gerhard Hutchison, David Tygar, Doug
RelatedPersons_xml	– sequence: 1 givenname: David surname: Hutchison fullname: Hutchison, David organization: Lancaster University, Lancaster, United Kingdom – sequence: 2 givenname: Takeo surname: Kanade fullname: Kanade, Takeo organization: Carnegie Mellon University, Pittsburgh, USA – sequence: 3 givenname: Josef surname: Kittler fullname: Kittler, Josef organization: University of Surrey, Guildford, United Kingdom – sequence: 4 givenname: Jon M. surname: Kleinberg fullname: Kleinberg, Jon M. organization: Cornell University, Ithaca, USA – sequence: 5 givenname: Friedemann surname: Mattern fullname: Mattern, Friedemann organization: ETH Zurich, Zurich, Switzerland – sequence: 6 givenname: John C. surname: Mitchell fullname: Mitchell, John C. organization: Stanford University, Stanford, USA – sequence: 7 givenname: Moni surname: Naor fullname: Naor, Moni organization: Dept Applied Math & Computer Science, Weizmann Institute of Science, Rehovot, Israel – sequence: 8 givenname: C. surname: Pandu Rangan fullname: Pandu Rangan, C. organization: Indian Institute of Technology Madras, Chennai, India – sequence: 9 givenname: Bernhard surname: Steffen fullname: Steffen, Bernhard organization: TU Dortmund University, Dortmund, Germany – sequence: 10 givenname: Demetri surname: Terzopoulos fullname: Terzopoulos, Demetri organization: University of California, Los Angeles, USA – sequence: 11 givenname: Doug surname: Tygar fullname: Tygar, Doug organization: University of California, Berkeley, USA – sequence: 12 givenname: Gerhard surname: Weikum fullname: Weikum, Gerhard organization: Max Planck Institute for Informatics, Saarbrücken, Germany
SSID	ssj0002089860 ssj0002792
Score	2.097913
Snippet	The paper presents statistical data on POS distribution in the beginnings and the ends of everyday Russian utterances. The material for this study was a...
SourceID	springer proquest
SourceType	Publisher
StartPage	596
SubjectTerms	Corpus linguistics Everyday speech N-gram analysis Parts of speech Pragmatic Markers Probability Russian Syntax
Title	Quantitative Data on POS Distribution in the Beginnings and the Ends of Utterances in Everyday Russian Speech
URI	http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=6303795&ppg=612 http://link.springer.com/10.1007/978-3-319-99579-3_61
Volume	11096
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnZ1Nb9QwEIYtWC7AASggypd84IaMvHFsJ8e2ClQV33RRb5bj2IgDWcRmD_DrmXHi3ST0Ui7RKkoiax6vPZnMO0PIC53XmS5kw7h0nOW29MwGwZkFZxl2wFoWUeX67r06XeVnF_IiNdoe1CVd_cr9uVRX8j9U4RxwRZXsFcjuHgon4DfwhSMQhuPM-Z2GWXsNx0_vXa9LS60Zxvw_bW0bBWQxNch2Fj8LfPzwBett7rpcpSzHY2zP0MYOnimjssJUWfAkVyj3wakRE2crsNFvlI583m6i_LIfxTh2sCxmsYMUO5xFH0cBsKM3k_dNIbBqAfzJ1WQBXfK-K-0_y_E4AwPVUnhvyYTp669Pq1-rIZ16Wv26Oj5RsMvqUhqdS7gRttvrusgX5MZRdfb26y6elvGiLBRH-U4aZNYXWNoPeiSdvGxMk5eM2Xfx6G6c3yW3UYJCURsCo7xHrvn2gNxJlOmwHh-QW6NqkvfJjzFyisjpuqWAnI6R0-8tBb50j5wC8ngKkdN1oHvkeHFCTgfktEf-gKxeV-cnp2zopsG-ZaXqWMjK4BsHtoMlWloZGs698AHN02jVCO1C4IXPcvCBrWoKMAtcXziulSzrRjwki3bd-keECtEovXTa8-DyAGtA5pQFV1QFUda5tIeEJUOa-M1_SDR2vdk2Zob0kLxM1jZ4-cakYtqAyQgDIzQRk0FMj6_49Cfk5n7uPyWL7tfWPwNPsqufD5PoL4bacHo
linkProvider	Library Specific Holdings
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Speech+and+Computer&rft.atitle=Quantitative+Data+on+POS+Distribution+in+the+Beginnings+and+the+Ends+of+Utterances+in+Everyday+Russian+Speech&rft.date=2018-01-01&rft.pub=Springer+International+Publishing+AG&rft.isbn=9783319995786&rft.volume=11096&rft_id=info:doi/10.1007%2F978-3-319-99579-3_61&rft.externalDBID=612&rft.externalDocID=EBC6303795_745_612
thumbnail_s	http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F6303795-l.jpg