Automated Measures of Syntactic Complexity in Natural Speech Production: Older and Younger Adults as a Case Study

Agmon, Galit; Pradhan, Sameer; Ash, Sharon; Nevler, Naomi; Liberman, Mark; Grossman, Murray; Cho, Sunghye

Preprint

Automated Measures of Syntactic Complexity in Natural Speech Production: Older and Younger Adults as a Case Study

This article is a preprint and has not been certified by peer review [What does this mean?].

Author(s) / Creator(s)

Agmon, Galit

Pradhan, Sameer

Ash, Sharon

Nevler, Naomi

Liberman, Mark

Grossman, Murray

Cho, Sunghye

Abstract / Description

Purpose: Multiple methods have been suggested for quantifying syntactic complexity in speech. We compared the performance of eight automated syntactic complexity metrics to determine which best captured differences in syntactic complexity between two age groups. Method: We used natural speech samples produced in a picture description task by younger (n=76) and older (n=36) healthy participants, manually transcribed and segmented into sentences. We manually verified that older participants produced fewer complex structures. We developed a metric of syntactic complexity using automatically extracted syntactic structures as features in a multi-dimensional metric. Then, we compared our methods to seven other different methods: Yngve score, Frazier score, Frazier-Roark score, d-level, syntactic frequency, mean dependency distance and sentence length. We examined the success of each method in distinguishing the age group of speakers using logistic regression models. We repeated the same analysis with automatic transcription and segmentation using an ASR system. Results: Our multi-dimensional metric was successful in predicting age group (AUC=0.87), and it performed better than all the other metrics. High AUCs were also achieved by Yngve score (0.84) and sentence length (0.84). However, in a fully automated pipeline with ASR, their performance dropped, while the performance of the multi-dimensional metric remained high. Conclusions: Syntactic complexity in spontaneous speech can be quantified by directly assessing syntactic structures. It can be derived automatically, saving considerable time, cost and effort compared to manually analyzing large-scale corpora, while maintaining high face validity and parsimony.

Keyword(s)

syntactic complexity speech aging natural language processing (NLP) syntax

Persistent Identifier

https://doi.org/10.23668/psycharchives.13145

Date of first publication

2023-08-21

Publisher

PsychArchives

Is version of

https://doi.org/10.1044/2023_JSLHR-23-00009

Citation

Select Style

Download BibTex

Download as Text

AGMON_syntactic_complexity_metrics_preprint2.pdf

Adobe PDF - 988.92KB

MD5: 64a3499fc12091ac8722ea12a65adae0

Sharing Level 0 (Public Use) CC-BY 4.0

Download

2

2023-08-21

Expanded the comparison from three metrics of syntactic complexity to eight, compared the effects of automated vs. manual transcription on the performance of these metrics, and added k-fold cross validation to the assessment of the metrics' performance.
1

2022-12-30

View object

Author(s) / Creator(s)

Agmon, Galit
Author(s) / Creator(s)

Pradhan, Sameer
Author(s) / Creator(s)

Ash, Sharon
Author(s) / Creator(s)

Nevler, Naomi
Author(s) / Creator(s)

Liberman, Mark
Author(s) / Creator(s)

Grossman, Murray
Author(s) / Creator(s)

Cho, Sunghye
PsychArchives acquisition timestamp

2023-08-21T08:24:22Z
Made available on

2022-12-30T08:11:11Z
Made available on

2023-08-21T08:24:22Z
Date of first publication

2023-08-21
Abstract / Description

Purpose: Multiple methods have been suggested for quantifying syntactic complexity in speech. We compared the performance of eight automated syntactic complexity metrics to determine which best captured differences in syntactic complexity between two age groups. Method: We used natural speech samples produced in a picture description task by younger (n=76) and older (n=36) healthy participants, manually transcribed and segmented into sentences. We manually verified that older participants produced fewer complex structures. We developed a metric of syntactic complexity using automatically extracted syntactic structures as features in a multi-dimensional metric. Then, we compared our methods to seven other different methods: Yngve score, Frazier score, Frazier-Roark score, d-level, syntactic frequency, mean dependency distance and sentence length. We examined the success of each method in distinguishing the age group of speakers using logistic regression models. We repeated the same analysis with automatic transcription and segmentation using an ASR system. Results: Our multi-dimensional metric was successful in predicting age group (AUC=0.87), and it performed better than all the other metrics. High AUCs were also achieved by Yngve score (0.84) and sentence length (0.84). However, in a fully automated pipeline with ASR, their performance dropped, while the performance of the multi-dimensional metric remained high. Conclusions: Syntactic complexity in spontaneous speech can be quantified by directly assessing syntactic structures. It can be derived automatically, saving considerable time, cost and effort compared to manually analyzing large-scale corpora, while maintaining high face validity and parsimony.

en_US
Publication status

other
Review status

notReviewed
Sponsorship

This work was supported by the Department of Defense Grant W81XWH-20-1-0531 (awarded to Murray Grossman, Naomi Nevler, and Galit Agmon), National Institute of Aging Grants AG073510-01 (awarded to Naomi Nevler), AG066597 (awarded to Murray Grossman), and Alzheimer's Association Grant AARF-21-851126 (awarded to Sunghye Cho).

en
Persistent Identifier

https://hdl.handle.net/20.500.12034/7872.2
Persistent Identifier

https://doi.org/10.23668/psycharchives.13145
Language of content

eng

en_US
Publisher

PsychArchives

en_US
Is version of

https://doi.org/10.1044/2023_JSLHR-23-00009
Keyword(s)

syntactic complexity

en_US
Keyword(s)

speech

en_US
Keyword(s)

aging

en_US
Keyword(s)

natural language processing (NLP)

en_US
Keyword(s)

syntax

en_US
Dewey Decimal Classification number(s)

150
Title

Automated Measures of Syntactic Complexity in Natural Speech Production: Older and Younger Adults as a Case Study

en_US
DRO type

preprint

en_US