English corpus and literary analysis based on statistical language model

In this paper, the cross-language retrieval model based on statistical language model, cross-lingual text categorization method and cross-lingual text clustering method are studied systematically and deeply. Without any help of cross-lingual resources such as machine translation and bilingual dictio...

Full description

Saved in:
Bibliographic Details
Published inCluster computing Vol. 22; no. Suppl 6; pp. 14897 - 14903
Main Authors Huang, Bo, Lan, Xijun
Format Journal Article
LanguageEnglish
Published New York Springer US 01.11.2019
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this paper, the cross-language retrieval model based on statistical language model, cross-lingual text categorization method and cross-lingual text clustering method are studied systematically and deeply. Without any help of cross-lingual resources such as machine translation and bilingual dictionaries, this paper can solve the many-to-many problem of word translation in CLIR and solve the problem of unregistered words partially. Under a unified framework, a series of topics are extracted from bilingual parallel corpora to form the thematic space for each language. Thematic space of each language exists independently, and the bilingual subject space is established through the bilingual semantic correspondence. The bilingual subject space reflects the semantic correspondence between documents and documents, words and words. It reveals the inherent structure and internal relations among languages and languages.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1386-7857
1573-7543
DOI:10.1007/s10586-018-2454-y