Preprint

Automated Measures of Syntactic Complexity in Natural Speech Production: Older and Younger Adults as a Case Study

This article is a preprint and has not been certified by peer review [What does this mean?].

Author(s) / Creator(s)

Agmon, Galit
Pradhan, Sameer
Ash, Sharon
Nevler, Naomi
Liberman, Mark
Grossman, Murray
Cho, Sunghye

Abstract / Description

Purpose: Multiple methods have been suggested for quantifying syntactic complexity in speech. We compared the performance of eight automated syntactic complexity metrics to determine which best captured differences in syntactic complexity between two age groups. Method: We used natural speech samples produced in a picture description task by younger (n=76) and older (n=36) healthy participants, manually transcribed and segmented into sentences. We manually verified that older participants produced fewer complex structures. We developed a metric of syntactic complexity using automatically extracted syntactic structures as features in a multi-dimensional metric. Then, we compared our methods to seven other different methods: Yngve score, Frazier score, Frazier-Roark score, d-level, syntactic frequency, mean dependency distance and sentence length. We examined the success of each method in distinguishing the age group of speakers using logistic regression models. We repeated the same analysis with automatic transcription and segmentation using an ASR system. Results: Our multi-dimensional metric was successful in predicting age group (AUC=0.87), and it performed better than all the other metrics. High AUCs were also achieved by Yngve score (0.84) and sentence length (0.84). However, in a fully automated pipeline with ASR, their performance dropped, while the performance of the multi-dimensional metric remained high. Conclusions: Syntactic complexity in spontaneous speech can be quantified by directly assessing syntactic structures. It can be derived automatically, saving considerable time, cost and effort compared to manually analyzing large-scale corpora, while maintaining high face validity and parsimony.

Keyword(s)

syntactic complexity speech aging natural language processing (NLP) syntax

Persistent Identifier

Date of first publication

2023-08-21

Publisher

PsychArchives

Is version of

Citation

  • 2
    2023-08-21
    Expanded the comparison from three metrics of syntactic complexity to eight, compared the effects of automated vs. manual transcription on the performance of these metrics, and added k-fold cross validation to the assessment of the metrics' performance.
  • 1
    2022-12-30
  • Author(s) / Creator(s)
    Agmon, Galit
  • Author(s) / Creator(s)
    Pradhan, Sameer
  • Author(s) / Creator(s)
    Ash, Sharon
  • Author(s) / Creator(s)
    Nevler, Naomi
  • Author(s) / Creator(s)
    Liberman, Mark
  • Author(s) / Creator(s)
    Grossman, Murray
  • Author(s) / Creator(s)
    Cho, Sunghye
  • PsychArchives acquisition timestamp
    2023-08-21T08:24:22Z
  • Made available on
    2022-12-30T08:11:11Z
  • Made available on
    2023-08-21T08:24:22Z
  • Date of first publication
    2023-08-21
  • Abstract / Description
    Purpose: Multiple methods have been suggested for quantifying syntactic complexity in speech. We compared the performance of eight automated syntactic complexity metrics to determine which best captured differences in syntactic complexity between two age groups. Method: We used natural speech samples produced in a picture description task by younger (n=76) and older (n=36) healthy participants, manually transcribed and segmented into sentences. We manually verified that older participants produced fewer complex structures. We developed a metric of syntactic complexity using automatically extracted syntactic structures as features in a multi-dimensional metric. Then, we compared our methods to seven other different methods: Yngve score, Frazier score, Frazier-Roark score, d-level, syntactic frequency, mean dependency distance and sentence length. We examined the success of each method in distinguishing the age group of speakers using logistic regression models. We repeated the same analysis with automatic transcription and segmentation using an ASR system. Results: Our multi-dimensional metric was successful in predicting age group (AUC=0.87), and it performed better than all the other metrics. High AUCs were also achieved by Yngve score (0.84) and sentence length (0.84). However, in a fully automated pipeline with ASR, their performance dropped, while the performance of the multi-dimensional metric remained high. Conclusions: Syntactic complexity in spontaneous speech can be quantified by directly assessing syntactic structures. It can be derived automatically, saving considerable time, cost and effort compared to manually analyzing large-scale corpora, while maintaining high face validity and parsimony.
    en_US
  • Publication status
    other
  • Review status
    notReviewed
  • Sponsorship
    This work was supported by the Department of Defense Grant W81XWH-20-1-0531 (awarded to Murray Grossman, Naomi Nevler, and Galit Agmon), National Institute of Aging Grants AG073510-01 (awarded to Naomi Nevler), AG066597 (awarded to Murray Grossman), and Alzheimer's Association Grant AARF-21-851126 (awarded to Sunghye Cho).
    en
  • Persistent Identifier
    https://hdl.handle.net/20.500.12034/7872.2
  • Persistent Identifier
    https://doi.org/10.23668/psycharchives.13145
  • Language of content
    eng
    en_US
  • Publisher
    PsychArchives
    en_US
  • Is version of
    https://doi.org/10.1044/2023_JSLHR-23-00009
  • Keyword(s)
    syntactic complexity
    en_US
  • Keyword(s)
    speech
    en_US
  • Keyword(s)
    aging
    en_US
  • Keyword(s)
    natural language processing (NLP)
    en_US
  • Keyword(s)
    syntax
    en_US
  • Dewey Decimal Classification number(s)
    150
  • Title
    Automated Measures of Syntactic Complexity in Natural Speech Production: Older and Younger Adults as a Case Study
    en_US
  • DRO type
    preprint
    en_US