Language patterns in Japanese patients with Alzheimer disease: A machine learning approach
Aim The authors applied natural language processing and machine learning to explore the disease‐related language patterns that warrant objective measures for assessing language ability in Japanese patients with Alzheimer disease (AD), while most previous studies have used large publicly available da...
Saved in:
Published in | Psychiatry and clinical neurosciences Vol. 77; no. 5; pp. 273 - 281 |
---|---|
Main Authors | , , , , , , , , |
Format | Journal Article |
Language | English |
Published |
Melbourne
John Wiley & Sons Australia, Ltd
01.05.2023
Wiley Subscription Services, Inc |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Aim
The authors applied natural language processing and machine learning to explore the disease‐related language patterns that warrant objective measures for assessing language ability in Japanese patients with Alzheimer disease (AD), while most previous studies have used large publicly available data sets in Euro‐American languages.
Methods
The authors obtained 276 speech samples from 42 patients with AD and 52 healthy controls, aged 50 years or older. A natural language processing library for Python was used, spaCy, with an add‐on library, GiNZA, which is a Japanese parser based on Universal Dependencies designed to facilitate multilingual parser development. The authors used eXtreme Gradient Boosting for our classification algorithm. Each unit of part‐of‐speech and dependency was tagged and counted to create features such as tag‐frequency and tag‐to‐tag transition‐frequency. Each feature's importance was computed during the 100‐fold repeated random subsampling validation and averaged.
Results
The model resulted in an accuracy of 0.84 (SD = 0.06), and an area under the curve of 0.90 (SD = 0.03). Among the features that were important for such predictions, seven of the top 10 features were related to part‐of‐speech, while the remaining three were related to dependency. A box plot analysis demonstrated that the appearance rates of content words–related features were lower among the patients, whereas those with stagnation‐related features were higher.
Conclusion
The current study demonstrated a promising level of accuracy for predicting AD and found the language patterns corresponding to the type of lexical‐semantic decline known as ‘empty speech’, which is regarded as a characteristic of AD. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
ISSN: | 1323-1316 1440-1819 1440-1819 |
DOI: | 10.1111/pcn.13526 |