Quantitative Data on POS Distribution in the Beginnings and the Ends of Utterances in Everyday Russian Speech
The paper presents statistical data on POS distribution in the beginnings and the ends of everyday Russian utterances. The material for this study was a morphologically annotated subcorpus of the ORD corpus of spoken Russian with volume of 149737 tokens and containing fragments of everyday speech of...
Saved in:
Published in | Speech and Computer Vol. 11096; pp. 596 - 605 |
---|---|
Main Author | |
Format | Book Chapter |
Language | English |
Published |
Switzerland
Springer International Publishing AG
01.01.2018
Springer International Publishing |
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
ISBN | 3319995782 9783319995786 |
ISSN | 0302-9743 1611-3349 |
DOI | 10.1007/978-3-319-99579-3_61 |
Cover
Abstract | The paper presents statistical data on POS distribution in the beginnings and the ends of everyday Russian utterances. The material for this study was a morphologically annotated subcorpus of the ORD corpus of spoken Russian with volume of 149737 tokens and containing fragments of everyday speech of 213 people of different gender, age, and professional groups. In the proposed study, the method of n-gram analysis, which is typically employed in computational linguistics to construct probabilistic language models, was used. In the subcorpus as a whole, the most frequent POS turned out to be verbs (17.23%), personal pronouns (15.60%), nouns (14%), particles (13%), and conjunctions (9%). However, in the initial position of spoken utterances the most frequent POS are particles (19.99%) and conjunctions (12%), and in the final position of utterances the verbs and nouns are used more often than others. The former are more typical for interrogative (27.66%) and narrative (25.42%) utterances, and the latter are frequently used in exclamative (29.95%) and narrative (24.28%) utterances. Besides, the most typical bigrams and trigrams in the beginning of utterances started with a particle and their probabilities are presented. A high percentage of syntactic models containing particles in the initial position of utterances leads us to the assumption that these units have special pragmatic functions, associated with marking phrase boundaries. Statistical data obtained here may be used for modeling of everyday utterances for the variety of dialogue systems and for improvement of Russian speech recognition systems. |
---|---|
AbstractList | The paper presents statistical data on POS distribution in the beginnings and the ends of everyday Russian utterances. The material for this study was a morphologically annotated subcorpus of the ORD corpus of spoken Russian with volume of 149737 tokens and containing fragments of everyday speech of 213 people of different gender, age, and professional groups. In the proposed study, the method of n-gram analysis, which is typically employed in computational linguistics to construct probabilistic language models, was used. In the subcorpus as a whole, the most frequent POS turned out to be verbs (17.23%), personal pronouns (15.60%), nouns (14%), particles (13%), and conjunctions (9%). However, in the initial position of spoken utterances the most frequent POS are particles (19.99%) and conjunctions (12%), and in the final position of utterances the verbs and nouns are used more often than others. The former are more typical for interrogative (27.66%) and narrative (25.42%) utterances, and the latter are frequently used in exclamative (29.95%) and narrative (24.28%) utterances. Besides, the most typical bigrams and trigrams in the beginning of utterances started with a particle and their probabilities are presented. A high percentage of syntactic models containing particles in the initial position of utterances leads us to the assumption that these units have special pragmatic functions, associated with marking phrase boundaries. Statistical data obtained here may be used for modeling of everyday utterances for the variety of dialogue systems and for improvement of Russian speech recognition systems. |
Author | Sherstinova, Tatiana |
Author_xml | – sequence: 1 givenname: Tatiana orcidid: 0000-0002-9085-3378 surname: Sherstinova fullname: Sherstinova, Tatiana email: t.sherstinova@spbu.ru organization: National Research University Higher School of Economiсs, St. Petersburg, Russia |
BookMark | eNo1kNtOGzEQhk0LFQnkDXrhFzCM7V0fLlsIBwmJlsO15axnE7fUm64dJN4eb2ivRvpm_tHMNyeHaUhIyFcOZxxAn1ttmGSSW2Ztqy2TTvEDsqhYVrhn8hOZccU5k7Kxn8n8f8OIQzIDCYJZ3cgvZM6hFYoLDfqYLHL-BQACjDUKZuTPz51PJRZf4ivSS188HRL9cf9IL2MuY1ztSqwgJlo2SL_jOqYU0zpTn8IeLVPIdOjpcyk4-tRhnoaXrzi-Bf9GH3Y5R5_o4xax25ySo96_ZFz8qyfk-Wr5dHHD7u6vby--3bG1sKqwXtgeQ6dNYzS0vu0DAErspw-DVkHqru_BoGgUgFfBVBt13nSgVWtXQZ4Q8bE3b8d6LY5uNQy_s-PgJruuanTS1W1ub9JNdmuo-Qhtx-HvDnNxOKU6TGX0L93Gb-uH2SkJUtvW6aatKSHfAQMMex4 |
ContentType | Book Chapter |
Copyright | Springer Nature Switzerland AG 2018 |
Copyright_xml | – notice: Springer Nature Switzerland AG 2018 |
DBID | FFUUA |
DEWEY | 006.35 |
DOI | 10.1007/978-3-319-99579-3_61 |
DatabaseName | ProQuest Ebook Central - Book Chapters - Demo use only |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Computer Science |
EISBN | 9783319995793 3319995790 |
EISSN | 1611-3349 |
Editor | Potapova, Rodmonga Karpov, Alexey Jokisch, Oliver |
Editor_xml | – sequence: 1 fullname: Karpov, Alexey – sequence: 2 fullname: Jokisch, Oliver – sequence: 3 fullname: Potapova, Rodmonga |
EndPage | 605 |
ExternalDocumentID | EBC6303795_745_612 |
GroupedDBID | 0D6 0DA 38. AABBV ACOUV AEDXK AEJLV AEKFX AEZAY ALMA_UNASSIGNED_HOLDINGS ANXHU BBABE BICGV BJAWL BUBNW CVGDX CZZ EDOXC FFUUA FOYMO I4C IEZ NQNQZ OEBZI SBO TPJZQ TSXQS Z5O Z7R Z7S Z7U Z7V Z7W Z7X Z7Y Z7Z Z81 Z82 Z83 Z84 Z85 Z87 Z88 -DT -~X 29L 2HA 2HV ACGFS ADCXD EJD F5P LAS LDH P2P RSU ~02 |
ID | FETCH-LOGICAL-g296t-f29fedc7848705a5fd00e3ef3319d76d37cff08e24600a6d8579edc8c07659bd3 |
ISBN | 3319995782 9783319995786 |
ISSN | 0302-9743 |
IngestDate | Tue Jul 29 20:12:42 EDT 2025 Thu May 29 00:43:52 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
LCCallNum | Q334-342 |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-g296t-f29fedc7848705a5fd00e3ef3319d76d37cff08e24600a6d8579edc8c07659bd3 |
OCLC | 1052612707 |
ORCID | 0000-0002-9085-3378 |
PQID | EBC6303795_745_612 |
PageCount | 10 |
ParticipantIDs | springer_books_10_1007_978_3_319_99579_3_61 proquest_ebookcentralchapters_6303795_745_612 |
PublicationCentury | 2000 |
PublicationDate | 2018-01-01 |
PublicationDateYYYYMMDD | 2018-01-01 |
PublicationDate_xml | – month: 01 year: 2018 text: 2018-01-01 day: 01 |
PublicationDecade | 2010 |
PublicationPlace | Switzerland |
PublicationPlace_xml | – name: Switzerland – name: Cham |
PublicationSeriesSubtitle | Lecture Notes in Artificial Intelligence |
PublicationSeriesTitle | Lecture Notes in Computer Science |
PublicationSeriesTitleAlternate | Lect.Notes Computer |
PublicationSubtitle | 20th International Conference, SPECOM 2018, Leipzig, Germany, September 18-22, 2018, Proceedings |
PublicationTitle | Speech and Computer |
PublicationYear | 2018 |
Publisher | Springer International Publishing AG Springer International Publishing |
Publisher_xml | – name: Springer International Publishing AG – name: Springer International Publishing |
RelatedPersons | Kleinberg, Jon M. Mattern, Friedemann Naor, Moni Mitchell, John C. Terzopoulos, Demetri Steffen, Bernhard Pandu Rangan, C. Kanade, Takeo Kittler, Josef Weikum, Gerhard Hutchison, David Tygar, Doug |
RelatedPersons_xml | – sequence: 1 givenname: David surname: Hutchison fullname: Hutchison, David organization: Lancaster University, Lancaster, United Kingdom – sequence: 2 givenname: Takeo surname: Kanade fullname: Kanade, Takeo organization: Carnegie Mellon University, Pittsburgh, USA – sequence: 3 givenname: Josef surname: Kittler fullname: Kittler, Josef organization: University of Surrey, Guildford, United Kingdom – sequence: 4 givenname: Jon M. surname: Kleinberg fullname: Kleinberg, Jon M. organization: Cornell University, Ithaca, USA – sequence: 5 givenname: Friedemann surname: Mattern fullname: Mattern, Friedemann organization: ETH Zurich, Zurich, Switzerland – sequence: 6 givenname: John C. surname: Mitchell fullname: Mitchell, John C. organization: Stanford University, Stanford, USA – sequence: 7 givenname: Moni surname: Naor fullname: Naor, Moni organization: Dept Applied Math & Computer Science, Weizmann Institute of Science, Rehovot, Israel – sequence: 8 givenname: C. surname: Pandu Rangan fullname: Pandu Rangan, C. organization: Indian Institute of Technology Madras, Chennai, India – sequence: 9 givenname: Bernhard surname: Steffen fullname: Steffen, Bernhard organization: TU Dortmund University, Dortmund, Germany – sequence: 10 givenname: Demetri surname: Terzopoulos fullname: Terzopoulos, Demetri organization: University of California, Los Angeles, USA – sequence: 11 givenname: Doug surname: Tygar fullname: Tygar, Doug organization: University of California, Berkeley, USA – sequence: 12 givenname: Gerhard surname: Weikum fullname: Weikum, Gerhard organization: Max Planck Institute for Informatics, Saarbrücken, Germany |
SSID | ssj0002089860 ssj0002792 |
Score | 2.097913 |
Snippet | The paper presents statistical data on POS distribution in the beginnings and the ends of everyday Russian utterances. The material for this study was a... |
SourceID | springer proquest |
SourceType | Publisher |
StartPage | 596 |
SubjectTerms | Corpus linguistics Everyday speech N-gram analysis Parts of speech Pragmatic Markers Probability Russian Syntax |
Title | Quantitative Data on POS Distribution in the Beginnings and the Ends of Utterances in Everyday Russian Speech |
URI | http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=6303795&ppg=612 http://link.springer.com/10.1007/978-3-319-99579-3_61 |
Volume | 11096 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnZ1Nb9QwEIYtWC7AASggypd84IaMvHFsJ8e2ClQV33RRb5bj2IgDWcRmD_DrmXHi3ST0Ui7RKkoiax6vPZnMO0PIC53XmS5kw7h0nOW29MwGwZkFZxl2wFoWUeX67r06XeVnF_IiNdoe1CVd_cr9uVRX8j9U4RxwRZXsFcjuHgon4DfwhSMQhuPM-Z2GWXsNx0_vXa9LS60Zxvw_bW0bBWQxNch2Fj8LfPzwBett7rpcpSzHY2zP0MYOnimjssJUWfAkVyj3wakRE2crsNFvlI583m6i_LIfxTh2sCxmsYMUO5xFH0cBsKM3k_dNIbBqAfzJ1WQBXfK-K-0_y_E4AwPVUnhvyYTp669Pq1-rIZ16Wv26Oj5RsMvqUhqdS7gRttvrusgX5MZRdfb26y6elvGiLBRH-U4aZNYXWNoPeiSdvGxMk5eM2Xfx6G6c3yW3UYJCURsCo7xHrvn2gNxJlOmwHh-QW6NqkvfJjzFyisjpuqWAnI6R0-8tBb50j5wC8ngKkdN1oHvkeHFCTgfktEf-gKxeV-cnp2zopsG-ZaXqWMjK4BsHtoMlWloZGs698AHN02jVCO1C4IXPcvCBrWoKMAtcXziulSzrRjwki3bd-keECtEovXTa8-DyAGtA5pQFV1QFUda5tIeEJUOa-M1_SDR2vdk2Zob0kLxM1jZ4-cakYtqAyQgDIzQRk0FMj6_49Cfk5n7uPyWL7tfWPwNPsqufD5PoL4bacHo |
linkProvider | Library Specific Holdings |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Speech+and+Computer&rft.atitle=Quantitative+Data+on+POS+Distribution+in+the+Beginnings+and+the+Ends+of+Utterances+in+Everyday+Russian+Speech&rft.date=2018-01-01&rft.pub=Springer+International+Publishing+AG&rft.isbn=9783319995786&rft.volume=11096&rft_id=info:doi/10.1007%2F978-3-319-99579-3_61&rft.externalDBID=612&rft.externalDocID=EBC6303795_745_612 |
thumbnail_s | http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F6303795-l.jpg |