This repository contains the code for preprocessing the CoLAGaze text, eye-tracking and behavioral data and conducting the data analysis described in the paper CoLAGaze: A Corpus of Eye Movements for Linguistic Acceptability.
The data is stored on OSF. It includes the following components:
-
Annotated stimuli: Text materials with grammaticality annotations, corresponding Areas of Interest (AOI) files, and associated textual features.
-
Eye-tracking data: Provided in multiple formats, including raw .edf files, ASCII exports, gaze events (e.g., fixations and saccades) before and after vertical drift correction, and reading measures at both the word and sentence levels.
-
Data quality reports: Documentation reporting calibration and validation scores, data loss ratios, blink rates, dwell time on stimuli.
-
Plots: Trial-level visualizations including main sequence plots, trace plots, and gaze event plots.
-
Behavioral data: Participants’ responses to comprehension questions and grammatical acceptability judgments.
-
Participant metadata: Demographic and sociolinguistic background information provided via questionnaires.
CoLAGaze is integrated into the pymovements package. And can be accessed with Python or R code.
Python code:
import pymovements as pm
# Specify the dataset name and the local data directory.
dataset = pm.Dataset(name ='CoLAGaze',path ='data/')
# Download the dataset
dataset.download ()
R code:
pm <- import ("pymovements")
# Specify the dataset name and the local data directory
dataset = pm$Dataset ('CoLAGaze', path ='data/')
# Download the dataset
dataset$download ()
- Stimuli data processing
- map character level AOIs to word-level AOIs:
word_index_mapping.py
- compute word-level surprisal:
compute_surprisal.py
- compute word_level textual features (length, lemma frequency) and sentence-level textual features:
compute_lemma_frequency.py
,compute Berzak's_scanpathreg_feat.py
- map character level AOIs to word-level AOIs:
- Behavioral data processing
- check the accuracy of the comprehension questions responses:
CQ_responses_accuracy_check.py
- check the accuracy of the grammatical acceptability judgments responses
grammaticality_responses_accuracy_check.py
- check the accuracy of the comprehension questions responses:
- Gaze data processing
- denoise the data and extract gaze events (fixations and saccades):
preprocessing.py
- correct vertical drift for some trials (manual correction):
vertical drift correction.py
- compute word-level reading measures and sentence level reading measures:
map_events_compute_features.py
- compute scanpath regularity based on What is the scanpath signature of syntactic reanalysis?:
compute Berzak's_scanpathreg_feat.py
- compute features from Predicting Native Language from Gaze:
compute Berzak's_scanpathreg_feat.py
- denoise the data and extract gaze events (fixations and saccades):
- create main sequence plots, trace plots, and gaze event plots:
create_plots.py
- alalyse the difference between reading measures for grammatical and ungrammatical sentences, visualise:
CoLAGaze analysis.R
- extract calibration and validation quality:
calibration_validation_report.py
- compute dwell time on stimuli and skipping rate for content words and function words:
dwell_time_skip_rate_report.py