Interpretable speech features vs. DNN embeddings: What to use in the automatic assessment of Parkinson's disease in multi-lingual scenarios

Comput Biol Med. 2023 Nov:166:107559. doi: 10.1016/j.compbiomed.2023.107559. Epub 2023 Oct 12.

Abstract

Speech-based approaches for assessing Parkinson's Disease (PD) often rely on feature extraction for automatic classification or detection. While many studies prioritize accuracy by using non-interpretable embeddings from Deep Neural Networks, this work aims to explore the predictive capabilities and language robustness of both feature types in a systematic fashion. As interpretable features, prosodic, linguistic, and cognitive descriptors were adopted, while x-vectors, Wav2Vec 2.0, HuBERT, and TRILLsson representations were used as non-interpretable features. Mono-lingual, multi-lingual, and cross-lingual machine learning experiments were conducted leveraging six data sets comprising speech recordings from various languages: American English, Castilian Spanish, Colombian Spanish, Italian, German, and Czech. For interpretable feature-based models, the mean of the best F1-scores obtained from each language was 81% in mono-lingual, 81% in multi-lingual, and 71% in cross-lingual experiments. For non-interpretable feature-based models, instead, they were 85% in mono-lingual, 88% in multi-lingual, and 79% in cross-lingual experiments. Firstly, models based on non-interpretable features outperformed interpretable ones, especially in cross-lingual experiments. Specifically, TRILLsson provided the most stable and accurate results across tasks and data sets. Conversely, the two types of features adopted showed some level of language robustness in multi-lingual and cross-lingual experiments. Overall, these results suggest that interpretable feature-based models can be used by clinicians to evaluate the deterioration of the speech of patients with PD, while non-interpretable feature-based models can be leveraged to achieve higher detection accuracy.

Keywords: Deep learning; Interpretable features; Machine learning; Parkinson’s disease; Speech.

MeSH terms

  • Aged
  • Female
  • Humans
  • Language
  • Male
  • Middle Aged
  • Multilingualism
  • Neural Networks, Computer
  • Parkinson Disease* / physiopathology
  • Speech