MultiplEYE: Enabling multilingual eye-tracking data collection for human and machine language processing research

The MultiplEYE consortium; Jäger, Lena A.; Hollenstein, Nora; Matić Škorić, Ana; Jakobi, Deborah N.; Stegenwallner-Schütz, Maja; Ding, Cui; Pavlinušić Vilus, Eva; Kasperė, Ramunė; Müller, Marie-Luise

Preregistration

MultiplEYE: Enabling multilingual eye-tracking data collection for human and machine language processing research

Author(s) / Creator(s)

The MultiplEYE consortium

Jäger, Lena A.

Hollenstein, Nora

Matić Škorić, Ana

Jakobi, Deborah N.

Stegenwallner-Schütz, Maja

Ding, Cui

Pavlinušić Vilus, Eva

Kasperė, Ramunė

Müller, Marie-Luise

Abstract / Description

Eye-tracking is a gold-standard method for studying reading and language comprehension, yet the field lacks large-scale, multilingual datasets collected under standardized and FAIR-compliant conditions. This preregistration describes a large-scale, international eye-tracking-while-reading study conducted across multiple testing sites as part of the COST Action MultiplEYE (CA21131). Participants from diverse linguistic backgrounds read short naturalistic texts while their eye movements are recorded using a harmonized experimental protocol. The stimulus materials consist of parallel texts across languages and genres, enabling systematic cross-linguistic comparisons of reading behavior as well as comparison across different text types and levels of complexity. In addition to the reading task and comprehension questions, demographic information is collected for all participants, and a subset of sites administers standardized psychometric tests assessing a range of cognitive and linguistic abilities. Data collection, preprocessing, quality control, and documentation follow jointly defined standards to ensure comparability and reproducibility across sites. The resulting multilingual corpus will be openly shared via EyeStore, a FAIR-compliant repository hosted by the Research Data Center at the Leibniz Institute for Psychology (RDC at ZPID), providing a sustainable resource for research in psychology, linguistics and machine learning.

Keyword(s)

preregistration eyetracking reading language processing cross-linguistic psycholinguistics Eye-tracking reading language comprehension natural language processing multilingual low-resource languages FAIR data parallel corpus open science

Persistent Identifier

https://doi.org/10.23668/psycharchives.21607

PsychArchives acquisition timestamp

2026-01-28 18:06:06 UTC

Publisher

PsychArchives

Citation

Select Style

Download BibTex

Download as Text

Preregistration_MultiplEYE.pdf

Adobe PDF - 434.06KB

MD5 : c79b7a594a46e1f9c83b686051327fc2

Sharing Level 0 (Public Use) CC-BY-SA 4.0

Download

Is related to

Report
MultiplEYE Data Collection Guidelines

Hollenstein, Nora & Müller, Marie-Luise & Jakobi, Deborah N. & Ding, Cui & Stegenwallner-Schütz, Maja & Matić, Ana & Pavlinušić Vilus, Eva & Kasperė, Ramunė & Bondar, Anna & Filip, Maroš & Frank, Stefan & Hofmann, Jana & Krosness, Thyra & Lõo, Kaidi & Nedergaard, Johanne & Tschirner, Chiara & Jäger, Lena A., 2026-03-05, PsychArchives

The MultiplEYE Data Collection Guidelines, developed within COST Action MultiplEYE (CA21131), support the collaborative creation of a large-scale, multilingual eye-tracking-while-reading corpus. This data can be used to study human language processing from a psycholinguistic perspective as well as to improve and evaluate computational language models via machine learning. To ensure methodological consistency and comparability across participating labs, a standardized experiment and data collection protocol has been implemented. The Guidelines provide detailed instructions for the entire data collection process, including preparation, experiment implementation, documentation, and contribution of data to the corpus.
Research Data
The MultiplEYE Text Corpus Data and Materials

Nisioi, Sergiu & Bondar, Anna & Kasperé, Ramuné & Stegenwallner-Schütz, Maja, 2026-03-11, PsychArchives

Data and materials for the 39 language versions of the MultiplEYE Text Corpus pertaining to Kaspere, Bondar, Nisioi, Stegenwallner-Schütz et al. (2026). Text Corpus: Towards a Diverse and Ever-Expanding Multilingual Text Corpus. Proceedings of the 15th Language Resources and Evaluation Conference (LREC 2026). European Language Resources Association. For each language version, the repository includes three file types: (1) a stimuli-experiment file containing paginated stimulus texts, (2) a metadata file containing bibliographic and provenance information for each text, and (3) token-level linguistic annotation files. In addition, the repository provides (4) a pagination correspondence table documenting the alignment between each language version's pagination and the English reference pagination, and (5) a language coordinator list identifying the individuals coordinating the compilation of texts for each language version of the MultiplEYE Text Corpus.

There are no other versions of this object.

Author(s) / Creator(s)

The MultiplEYE consortium
Author(s) / Creator(s)

Jäger, Lena A.
Author(s) / Creator(s)

Hollenstein, Nora
Author(s) / Creator(s)

Matić Škorić, Ana
Author(s) / Creator(s)

Jakobi, Deborah N.
Author(s) / Creator(s)

Stegenwallner-Schütz, Maja
Author(s) / Creator(s)

Ding, Cui
Author(s) / Creator(s)

Pavlinušić Vilus, Eva
Author(s) / Creator(s)

Kasperė, Ramunė
Author(s) / Creator(s)

Müller, Marie-Luise
PsychArchives acquisition timestamp

2026-01-28T18:06:06Z
Made available on

2026-01-28T18:06:06Z
Date of first publication

2026-01-28
Abstract / Description

Eye-tracking is a gold-standard method for studying reading and language comprehension, yet the field lacks large-scale, multilingual datasets collected under standardized and FAIR-compliant conditions. This preregistration describes a large-scale, international eye-tracking-while-reading study conducted across multiple testing sites as part of the COST Action MultiplEYE (CA21131). Participants from diverse linguistic backgrounds read short naturalistic texts while their eye movements are recorded using a harmonized experimental protocol. The stimulus materials consist of parallel texts across languages and genres, enabling systematic cross-linguistic comparisons of reading behavior as well as comparison across different text types and levels of complexity. In addition to the reading task and comprehension questions, demographic information is collected for all participants, and a subset of sites administers standardized psychometric tests assessing a range of cognitive and linguistic abilities. Data collection, preprocessing, quality control, and documentation follow jointly defined standards to ensure comparability and reproducibility across sites. The resulting multilingual corpus will be openly shared via EyeStore, a FAIR-compliant repository hosted by the Research Data Center at the Leibniz Institute for Psychology (RDC at ZPID), providing a sustainable resource for research in psychology, linguistics and machine learning.

en
Publication status

other
Review status

unknown
Sponsorship

This study is part of a broader collaborative initiative supported by the MultiplEYE COST Action, funded by the European Union through the European Cooperation in Science and Technology (COST).
Persistent Identifier

https://hdl.handle.net/20.500.12034/16990
Persistent Identifier

https://doi.org/10.23668/psycharchives.21607
Language of content

eng
Publisher

PsychArchives
Is related to

https://www.psycharchives.org/handle/20.500.12034/17111
Is related to

https://www.psycharchives.org/handle/20.500.12034/17126
Keyword(s)

preregistration
Keyword(s)

eyetracking
Keyword(s)

reading
Keyword(s)

language processing
Keyword(s)

cross-linguistic
Keyword(s)

psycholinguistics
Keyword(s)

Eye-tracking
Keyword(s)

reading
Keyword(s)

language comprehension
Keyword(s)

natural language processing
Keyword(s)

multilingual
Keyword(s)

low-resource languages
Keyword(s)

FAIR data
Keyword(s)

parallel corpus
Keyword(s)

open science
Dewey Decimal Classification number(s)

150
Title

MultiplEYE: Enabling multilingual eye-tracking data collection for human and machine language processing research

en
DRO type

preregistration
Leibniz institute name(s) / abbreviation(s)

ZPID
Leibniz subject classification

Psychologie
Leibniz subject classification

Sprache, Linguistik
Visible tag(s)

eyetracking
Visible tag(s)

reading
Visible tag(s)

open research data
Visible tag(s)

psycholingiustics
Visible tag(s)

language processing
Visible tag(s)

cross-linguistic research
Visible tag(s)

large-scale dataset
Visible tag(s)

PRP-QUANT