Automated Measures of Syntactic Complexity in Natural Speech Production: Older and Younger Adults as a Case Study
This article is a preprint and has not been certified by peer review [What does this mean?].
Author(s) / Creator(s)
Agmon, Galit
Pradhan, Sameer
Ash, Sharon
Nevler, Naomi
Liberman, Mark
Grossman, Murray
Cho, Sunghye
Abstract / Description
Purpose: Multiple methods have been suggested for quantifying syntactic complexity in speech. We compared the performance of eight automated syntactic complexity metrics to determine which best captured differences in syntactic complexity between two age groups.
Method: We used natural speech samples produced in a picture description task by younger (n=76) and older (n=36) healthy participants, manually transcribed and segmented into sentences. We manually verified that older participants produced fewer complex structures. We developed a metric of syntactic complexity using automatically extracted syntactic structures as features in a multi-dimensional metric. Then, we compared our methods to seven other different methods: Yngve score, Frazier score, Frazier-Roark score, d-level, syntactic frequency, mean dependency distance and sentence length. We examined the success of each method in distinguishing the age group of speakers using logistic regression models. We repeated the same analysis with automatic transcription and segmentation using an ASR system.
Results: Our multi-dimensional metric was successful in predicting age group (AUC=0.87), and it performed better than all the other metrics. High AUCs were also achieved by Yngve score (0.84) and sentence length (0.84). However, in a fully automated pipeline with ASR, their performance dropped, while the performance of the multi-dimensional metric remained high.
Conclusions: Syntactic complexity in spontaneous speech can be quantified by directly assessing syntactic structures. It can be derived automatically, saving considerable time, cost and effort compared to manually analyzing large-scale corpora, while maintaining high face validity and parsimony.
Keyword(s)
syntactic complexity speech aging natural language processing (NLP) syntaxPersistent Identifier
Date of first publication
2023-08-21
Publisher
PsychArchives
Is version of
Citation
-
AGMON_syntactic_complexity_metrics_preprint2.pdfAdobe PDF - 988.92KBMD5: 64a3499fc12091ac8722ea12a65adae0
-
22023-08-21Expanded the comparison from three metrics of syntactic complexity to eight, compared the effects of automated vs. manual transcription on the performance of these metrics, and added k-fold cross validation to the assessment of the metrics' performance.
-
Author(s) / Creator(s)Agmon, Galit
-
Author(s) / Creator(s)Pradhan, Sameer
-
Author(s) / Creator(s)Ash, Sharon
-
Author(s) / Creator(s)Nevler, Naomi
-
Author(s) / Creator(s)Liberman, Mark
-
Author(s) / Creator(s)Grossman, Murray
-
Author(s) / Creator(s)Cho, Sunghye
-
PsychArchives acquisition timestamp2023-08-21T08:24:22Z
-
Made available on2022-12-30T08:11:11Z
-
Made available on2023-08-21T08:24:22Z
-
Date of first publication2023-08-21
-
Abstract / DescriptionPurpose: Multiple methods have been suggested for quantifying syntactic complexity in speech. We compared the performance of eight automated syntactic complexity metrics to determine which best captured differences in syntactic complexity between two age groups. Method: We used natural speech samples produced in a picture description task by younger (n=76) and older (n=36) healthy participants, manually transcribed and segmented into sentences. We manually verified that older participants produced fewer complex structures. We developed a metric of syntactic complexity using automatically extracted syntactic structures as features in a multi-dimensional metric. Then, we compared our methods to seven other different methods: Yngve score, Frazier score, Frazier-Roark score, d-level, syntactic frequency, mean dependency distance and sentence length. We examined the success of each method in distinguishing the age group of speakers using logistic regression models. We repeated the same analysis with automatic transcription and segmentation using an ASR system. Results: Our multi-dimensional metric was successful in predicting age group (AUC=0.87), and it performed better than all the other metrics. High AUCs were also achieved by Yngve score (0.84) and sentence length (0.84). However, in a fully automated pipeline with ASR, their performance dropped, while the performance of the multi-dimensional metric remained high. Conclusions: Syntactic complexity in spontaneous speech can be quantified by directly assessing syntactic structures. It can be derived automatically, saving considerable time, cost and effort compared to manually analyzing large-scale corpora, while maintaining high face validity and parsimony.en_US
-
Publication statusother
-
Review statusnotReviewed
-
SponsorshipThis work was supported by the Department of Defense Grant W81XWH-20-1-0531 (awarded to Murray Grossman, Naomi Nevler, and Galit Agmon), National Institute of Aging Grants AG073510-01 (awarded to Naomi Nevler), AG066597 (awarded to Murray Grossman), and Alzheimer's Association Grant AARF-21-851126 (awarded to Sunghye Cho).en
-
Persistent Identifierhttps://hdl.handle.net/20.500.12034/7872.2
-
Persistent Identifierhttps://doi.org/10.23668/psycharchives.13145
-
Language of contentengen_US
-
PublisherPsychArchivesen_US
-
Is version ofhttps://doi.org/10.1044/2023_JSLHR-23-00009
-
Keyword(s)syntactic complexityen_US
-
Keyword(s)speechen_US
-
Keyword(s)agingen_US
-
Keyword(s)natural language processing (NLP)en_US
-
Keyword(s)syntaxen_US
-
Dewey Decimal Classification number(s)150
-
TitleAutomated Measures of Syntactic Complexity in Natural Speech Production: Older and Younger Adults as a Case Studyen_US
-
DRO typepreprinten_US