Time–Frequency Causal Hidden Markov Model for speech-based Alzheimer’s disease longitudinal detection

Speech deterioration is an early indicator in individuals with Alzheimer’s disease (AD), with progression influenced by various factors, leading to unique trajectories for each individual. To facilitate automated longitudinal detection of AD using speech, we propose an enhanced Hidden Markov Model (...

Full description

Saved in:

Bibliographic Details
Published in	Computer speech & language Vol. 95; p. 101862
Main Authors	Pan, Yilin, Li, Jiabing, Zhang, Yating, Tian, Zhuoran, Zhang, Yijia, Lu, Mingyu
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.01.2026
Subjects	Alzheimer’s disease Causal Hidden Markov Model Longitudinal tracking Parallel convolutional neural network Spectrogram Time frequency acoustic feature Causal Hidden Markov Model Longitudinal tracking Time frequency acoustic feature Alzheimer’s disease Spectrogram Parallel convolutional neural network
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Speech deterioration is an early indicator in individuals with Alzheimer’s disease (AD), with progression influenced by various factors, leading to unique trajectories for each individual. To facilitate automated longitudinal detection of AD using speech, we propose an enhanced Hidden Markov Model (HMM), termed the Time-Frequency Causal HMM (TF-CHMM), which models disease-causative acoustic features over time under the Markov property. The TF-CHMM integrates a parallel convolutional neural network as an encoder for spectrograms, extracting both time-domain and frequency-domain features from audio recordings linked to AD. Additionally, it incorporates personal attributes (e.g., age) and clinical diagnosis data (e.g., MMSE scores) as supplementary inputs, disentangling disease-related features from unrelated components through a sequential variational auto-encoder with causal inference. The TF-CHMM is evaluated using the Pitt Corpus, which includes annual visits for each subject with a variable number of longitudinal samples, comprising audio recordings, manual transcriptions, MMSE scores, and age information. Experimental results demonstrated the effectiveness of our designed system, achieving a competitive accuracy of 90.24% and an F1 score of 90.00%. An ablation study further highlighted the efficiency of the parallel convolutional kernels in extracting time–frequency information and emphasized the effectiveness of our longitudinal experimental setup in the AD detection system. •A parallel convolutional block is designed to extract time-frequency AD features.•Disease-related and disease-unrelated information is disentangled causally.•Designed longitudinally modeling system achieves SOTA performance on the Pitt Corpus.
ISSN:	0885-2308
DOI:	10.1016/j.csl.2025.101862