Automated Measures of Syntactic Complexity in Natural Speech Production: Older and Younger Adults as a Case Study

Agmon, Galit; Pradhan, Sameer; Ash, Sharon; Nevler, Naomi; Liberman, Mark; Grossman, Murray; Cho, Sunghye

This is not the latest version of this Digital Research Object (DRO). The latest version can be found here!

Preprint

Automated Measures of Syntactic Complexity in Natural Speech Production: Older and Younger Adults as a Case Study

This article is a preprint and has not been certified by peer review [What does this mean?].

Author(s) / Creator(s)

Agmon, Galit

Pradhan, Sameer

Ash, Sharon

Nevler, Naomi

Liberman, Mark

Grossman, Murray

Cho, Sunghye

Abstract / Description

There is no consensus on what syntactic complexity is or how it can be quantified in spontaneous speech. In the cognitive literature, complex syntactic structures have usually been studied using detailed linguistic comparisons. However, when studying spontaneous speech, highly controlled methods are challenging to implement. In this paper, we adopt an approach that considers the cognitive cost of syntactic structures for automatically quantifying syntactic complexity in spontaneous speech. We define syntactic complexity as the frequency of structures that are known to have a processing cost. We investigate those structures in natural speech samples produced in a picture description task by younger and older healthy participants. First, we show that older participants produce significantly fewer complex structures, which are identified manually in the transcripts. Second, to determine how to quantify the syntactic differences between the groups automatically, we examined three automatically derived metrics: 1. Direct assessment of complex syntactic structures; 2. Mean dependency distance; 3. Sentence length. Automated assessment of complex syntactic structures was the most successful metric in distinguishing between older and younger participants. Since this metric can be derived automatically, it can save considerable time, cost and effort compared to manually analyzing large-scale corpora, while maintaining high face validity and parsimony, suggesting that it is useful for studying syntactic complexity in spontaneous speech.

Keyword(s)

syntactic complexity speech aging natural language processing (NLP) syntax

Persistent Identifier

https://doi.org/10.23668/psycharchives.12331

Date of first publication

2022-12-30

Publisher

PsychArchives

Citation

Select Style

Download BibTex

Download as Text

preprint.pdf

Adobe PDF - 453.54KB

MD5: cf5b4901cb6aeb0a471e0f2ca8fb78b7

Sharing Level 0+ (Public Use) CC-BY 4.0

Download

2

2023-08-21

Expanded the comparison from three metrics of syntactic complexity to eight, compared the effects of automated vs. manual transcription on the performance of these metrics, and added k-fold cross validation to the assessment of the metrics' performance.

View object
1

2022-12-30

Author(s) / Creator(s)

Agmon, Galit
Author(s) / Creator(s)

Pradhan, Sameer
Author(s) / Creator(s)

Ash, Sharon
Author(s) / Creator(s)

Nevler, Naomi
Author(s) / Creator(s)

Liberman, Mark
Author(s) / Creator(s)

Grossman, Murray
Author(s) / Creator(s)

Cho, Sunghye
PsychArchives acquisition timestamp

2022-12-30T08:11:11Z
Made available on

2022-12-30T08:11:11Z
Date of first publication

2022-12-30
Abstract / Description

There is no consensus on what syntactic complexity is or how it can be quantified in spontaneous speech. In the cognitive literature, complex syntactic structures have usually been studied using detailed linguistic comparisons. However, when studying spontaneous speech, highly controlled methods are challenging to implement. In this paper, we adopt an approach that considers the cognitive cost of syntactic structures for automatically quantifying syntactic complexity in spontaneous speech. We define syntactic complexity as the frequency of structures that are known to have a processing cost. We investigate those structures in natural speech samples produced in a picture description task by younger and older healthy participants. First, we show that older participants produce significantly fewer complex structures, which are identified manually in the transcripts. Second, to determine how to quantify the syntactic differences between the groups automatically, we examined three automatically derived metrics: 1. Direct assessment of complex syntactic structures; 2. Mean dependency distance; 3. Sentence length. Automated assessment of complex syntactic structures was the most successful metric in distinguishing between older and younger participants. Since this metric can be derived automatically, it can save considerable time, cost and effort compared to manually analyzing large-scale corpora, while maintaining high face validity and parsimony, suggesting that it is useful for studying syntactic complexity in spontaneous speech.

en
Publication status

other
Review status

notReviewed
Persistent Identifier

https://hdl.handle.net/20.500.12034/7872
Persistent Identifier

https://doi.org/10.23668/psycharchives.12331
Language of content

eng
Publisher

PsychArchives
Keyword(s)

syntactic complexity

en
Keyword(s)

speech

en
Keyword(s)

aging

en
Keyword(s)

natural language processing (NLP)

en
Keyword(s)

syntax

en
Dewey Decimal Classification number(s)

150
Title

Automated Measures of Syntactic Complexity in Natural Speech Production: Older and Younger Adults as a Case Study

en
DRO type

preprint