Lexical and syntactic features of academic Russian texts: a discriminant analysis

This article presents three mathematical models to differentiate academic texts from three subject discourses written in Russian (i.e., Philological, Mathematical, and Natural Sciences) which further enable design and automated profiling of corresponding typologies. Our models include 5 indices, one...

Full description

Saved in:
Bibliographic Details
Published inRESEARCH RESULT Theoretical and Applied Linguistics Vol. 8; no. 4
Main Authors Kupriyanov, Roman V., Solnyshkina, Marina I., Dascalu, Mihai, Soldatkina, Tatyana A.
Format Journal Article
LanguageEnglish
Published 30.12.2022
Online AccessGet full text
ISSN2313-8912
2313-8912
DOI10.18413/2313-8912-2022-8-4-0-8

Cover

Loading…
Abstract This article presents three mathematical models to differentiate academic texts from three subject discourses written in Russian (i.e., Philological, Mathematical, and Natural Sciences) which further enable design and automated profiling of corresponding typologies. Our models include 5 indices, one at surface level (i.e., sentence length) and 4 syntax features (i.e., mean verbs per sentence, mean adjectives per sentence, local noun overlap, and global argument overlap). We identified and validated the five statistically significant features out of 45 linguistic features extracted from our research corpus consisting of 91.185 tokens. The shortest sentence length is found in Russian language textbooks while the longest sentences are identified in Natural Science texts. The mean number of verbs, nouns, and adjectives per sentence is higher in Natural Science textbooks, whereas Mathematics discourse is characterized by the shortest word length, highest local noun overlap, and highest global argument overlap. We assign the metric differences between the three discourses to their functions: Natural Science texts are characterized by descriptions and narrative passages in contrast to Philology that is associated with opinions. Mathematical discourse operates with precise definitions, explanations and justifications thus exercising numerous overlaps. The discriminant analysis built on top of the features supports the development of text profilers targeting parametric analyses. The automation of these features and the provided formulas for classification enable the design and development of text profilers required for textbook writing and editing. Our findings are useful for professional linguists, technologists, and academic writers to select and modify texts for their target audience.
AbstractList This article presents three mathematical models to differentiate academic texts from three subject discourses written in Russian (i.e., Philological, Mathematical, and Natural Sciences) which further enable design and automated profiling of corresponding typologies. Our models include 5 indices, one at surface level (i.e., sentence length) and 4 syntax features (i.e., mean verbs per sentence, mean adjectives per sentence, local noun overlap, and global argument overlap). We identified and validated the five statistically significant features out of 45 linguistic features extracted from our research corpus consisting of 91.185 tokens. The shortest sentence length is found in Russian language textbooks while the longest sentences are identified in Natural Science texts. The mean number of verbs, nouns, and adjectives per sentence is higher in Natural Science textbooks, whereas Mathematics discourse is characterized by the shortest word length, highest local noun overlap, and highest global argument overlap. We assign the metric differences between the three discourses to their functions: Natural Science texts are characterized by descriptions and narrative passages in contrast to Philology that is associated with opinions. Mathematical discourse operates with precise definitions, explanations and justifications thus exercising numerous overlaps. The discriminant analysis built on top of the features supports the development of text profilers targeting parametric analyses. The automation of these features and the provided formulas for classification enable the design and development of text profilers required for textbook writing and editing. Our findings are useful for professional linguists, technologists, and academic writers to select and modify texts for their target audience.
Author Solnyshkina, Marina I.
Kupriyanov, Roman V.
Soldatkina, Tatyana A.
Dascalu, Mihai
Author_xml – sequence: 1
  givenname: Roman V.
  surname: Kupriyanov
  fullname: Kupriyanov, Roman V.
– sequence: 2
  givenname: Marina I.
  surname: Solnyshkina
  fullname: Solnyshkina, Marina I.
– sequence: 3
  givenname: Mihai
  surname: Dascalu
  fullname: Dascalu, Mihai
– sequence: 4
  givenname: Tatyana A.
  surname: Soldatkina
  fullname: Soldatkina, Tatyana A.
BookMark eNqFkN1KAzEQhYNUsNY-g3mBaDLZbVLBCyn-QUEUvQ6z2QkEtlnZpNC-vVsVEW-8mmGG73DOOWWT1Cdi7FzJC2UrpS9BKy3sUoEACSCsqIQU9ohNfx6TX_sJm-ccGwm1hoWpzJQ9r2kXPXYcU8vzPhX0JXoeCMt2oMz7wNFjS5vx-LIdYUy80K7kK468jdkPcRMTpjIKYLfPMZ-x44Bdpvn3nLG3u9vX1YNYP90_rm7WwgMoK7SiGkmS0dDUptJowRiFSx9sBXrM1qi6IhkaWBDhIsgAsg1mtEJ13QTQM2a-dP3Q5zxQcO-jFxz2Tkn3WY475HaH3O5QjrOuctLZkbz-Q_pYsMQ-lQFj9y__AWj8bcs
CitedBy_id crossref_primary_10_1007_s10958_024_07436_y
crossref_primary_10_26907_2782_4756_2023_72_2_33_44
crossref_primary_10_18287_2542_0445_2024_30_4_228_234
crossref_primary_10_22363_2618_8163_2023_21_2_212_227
crossref_primary_10_22363_2687_0088_35817
ContentType Journal Article
CorporateAuthor Polytechnic University of Bucharest
Research Lab Laboratory “Expert Systems for Processing Language Structures and Vibroacoustics”, Kazan Federal University
Text Analytics Laboratory, Kazan Federal University
CorporateAuthor_xml – name: Text Analytics Laboratory, Kazan Federal University
– name: Polytechnic University of Bucharest
– name: Research Lab Laboratory “Expert Systems for Processing Language Structures and Vibroacoustics”, Kazan Federal University
DBID AAYXX
CITATION
DOI 10.18413/2313-8912-2022-8-4-0-8
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList CrossRef
DeliveryMethod fulltext_linktorsrc
EISSN 2313-8912
ExternalDocumentID 10_18413_2313_8912_2022_8_4_0_8
GroupedDBID AAYXX
ALMA_UNASSIGNED_HOLDINGS
CITATION
M~E
ID FETCH-LOGICAL-c2218-31e5ae0e732b5743a82771a9cf8423841b154e0fb26eea6f0f20df7adee55bf23
ISSN 2313-8912
IngestDate Thu Apr 24 23:10:55 EDT 2025
Tue Jul 01 04:06:38 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 4
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c2218-31e5ae0e732b5743a82771a9cf8423841b154e0fb26eea6f0f20df7adee55bf23
OpenAccessLink http://rrlinguistics.ru/journal/download/2976
ParticipantIDs crossref_primary_10_18413_2313_8912_2022_8_4_0_8
crossref_citationtrail_10_18413_2313_8912_2022_8_4_0_8
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2022-12-30
PublicationDateYYYYMMDD 2022-12-30
PublicationDate_xml – month: 12
  year: 2022
  text: 2022-12-30
  day: 30
PublicationDecade 2020
PublicationTitle RESEARCH RESULT Theoretical and Applied Linguistics
PublicationYear 2022
SSID ssib025326747
Score 2.234528
Snippet This article presents three mathematical models to differentiate academic texts from three subject discourses written in Russian (i.e., Philological,...
SourceID crossref
SourceType Enrichment Source
Index Database
Title Lexical and syntactic features of academic Russian texts: a discriminant analysis
Volume 8
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3Pb9MwFLbKuHBBIECMX_KBW-WSOU7i7DbB0EArEqhFu1l2YrOKKZ2WZKIc-HP4O3n-lUao0hiXqHL1npu-T_Znv_fZCL0-UCnwCmkIUNeCMG5yUpZ1RoysDuq6KOqstmrk-af8ZMk-nmVnk8nvUdVS36lZ9XOnruR_ogptEFerkr1FZAen0ACfIb7whAjD859ifKp_DGL_dtN0TvA0Ndod1ulqNGQsf__St04uaQs9Wq9wtoJcf6mXKzP3h5OMyWpMJtms0vJ04XX8QfRou4wMFtaz33p_4PM2M3R5tdrIZn3ty7dtpuDrbNjOWV80m_b8-8oL0uawXm_k9MPw_TvZQh-9L-s_l6uRXS27aLaQHfQgp0ez8dYFdbeohCyMG-GAW6aEl6GOWu9oC0M0HyGR7Rz4OUzGVtwSrYnrjxNGEsK3c13M7_81BQ6FiXZJZF0J60hYR8I6ElwwkQh-B92lReHKAea_juO4RTPgwIW7y27oP5QSWl9vdv-oEREaMZrFA3Q_LEXwkcfVQzTRzSP0OWAKQ4DxgCkcMYXXBkdM4YAp7DB1iCUeIwpHRD1Gy_fHi7cnJFy7QSoKhA9mZZ1JnegipSoDgim5fWdZVoYD94Y3UkC7dWIUzbWWuUkMTWpTQNc6y5Sh6RO016wb_RRhyrhijGUmTxVTulR2u4CndW4qJiuu9lEe_wNRhTPp7dUoF-KGQOyjZDC89Mey3GTy7PYmz9G9LWxfoL3uqtcvgYN26pUDwB8FhICE
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Lexical+and+syntactic+features+of+academic+Russian+texts%3A+a+discriminant+analysis&rft.jtitle=RESEARCH+RESULT+Theoretical+and+Applied+Linguistics&rft.au=Kupriyanov%2C+Roman+V.&rft.au=Solnyshkina%2C+Marina+I.&rft.au=Dascalu%2C+Mihai&rft.au=Soldatkina%2C+Tatyana+A.&rft.date=2022-12-30&rft.issn=2313-8912&rft.eissn=2313-8912&rft.volume=8&rft.issue=4&rft_id=info:doi/10.18413%2F2313-8912-2022-8-4-0-8&rft.externalDBID=n%2Fa&rft.externalDocID=10_18413_2313_8912_2022_8_4_0_8
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2313-8912&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2313-8912&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2313-8912&client=summon