Comparison of directed and weighted co-occurrence networks of six languages
To study commonalities and differences among different languages, we select 100 reports from the documents of the United Nations, each of which was written in Arabic, Chinese, English, French, Russian and Spanish languages, separately. Based on these corpora, we construct 6 weighted and directed wor...
Saved in:
Published in | Physica A Vol. 393; pp. 579 - 589 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
01.01.2014
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | To study commonalities and differences among different languages, we select 100 reports from the documents of the United Nations, each of which was written in Arabic, Chinese, English, French, Russian and Spanish languages, separately. Based on these corpora, we construct 6 weighted and directed word co-occurrence networks. Besides all the networks exhibit scale-free and small-world features, we find several new non-trivial results, including connections among English words are denser, and the expression of English language is more flexible and powerful; the connection way among Spanish words is more stringent and this indicates that the Spanish grammar is more rigorous; values of many statistical parameters of the French and Spanish networks are very approximate and this shows that these two languages share many commonalities; Arabic and Russian words have many varieties, which result in rich types of words and a sparse connection among words; connections among Chinese words obey a more uniform distribution, and one inclines to use the least number of Chinese words to express the same complex information as those in other five languages. This shows that the expression of Chinese language is quite concise. In addition, several topics worth further investigating by the complex network approach have been observed in this study.
•The English word connections are denser and its expression is more flexible.•Statistical data have shown that French and Spanish languages share many commonalities.•Statistical data have shown that Chinese and English languages share many commonalities.•Arabic and Russian word connections are sparse.•Chinese word connections obey a more uniform distribution. |
---|---|
Bibliography: | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 |
ISSN: | 0378-4371 1873-2119 |
DOI: | 10.1016/j.physa.2013.08.075 |