Comparison of directed and weighted co-occurrence networks of six languages

To study commonalities and differences among different languages, we select 100 reports from the documents of the United Nations, each of which was written in Arabic, Chinese, English, French, Russian and Spanish languages, separately. Based on these corpora, we construct 6 weighted and directed wor...

Full description

Saved in:
Bibliographic Details
Published inPhysica A Vol. 393; pp. 579 - 589
Main Authors Gao, Yuyang, Liang, Wei, Shi, Yuming, Huang, Qiuling
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.01.2014
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:To study commonalities and differences among different languages, we select 100 reports from the documents of the United Nations, each of which was written in Arabic, Chinese, English, French, Russian and Spanish languages, separately. Based on these corpora, we construct 6 weighted and directed word co-occurrence networks. Besides all the networks exhibit scale-free and small-world features, we find several new non-trivial results, including connections among English words are denser, and the expression of English language is more flexible and powerful; the connection way among Spanish words is more stringent and this indicates that the Spanish grammar is more rigorous; values of many statistical parameters of the French and Spanish networks are very approximate and this shows that these two languages share many commonalities; Arabic and Russian words have many varieties, which result in rich types of words and a sparse connection among words; connections among Chinese words obey a more uniform distribution, and one inclines to use the least number of Chinese words to express the same complex information as those in other five languages. This shows that the expression of Chinese language is quite concise. In addition, several topics worth further investigating by the complex network approach have been observed in this study. •The English word connections are denser and its expression is more flexible.•Statistical data have shown that French and Spanish languages share many commonalities.•Statistical data have shown that Chinese and English languages share many commonalities.•Arabic and Russian word connections are sparse.•Chinese word connections obey a more uniform distribution.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:0378-4371
1873-2119
DOI:10.1016/j.physa.2013.08.075