CoLAGaze

This repository contains the code for preprocessing the CoLAGaze text, eye-tracking and behavioral data and conducting the data analysis described in the paper CoLAGaze: A Corpus of Eye Movements for Linguistic Acceptability.

Accessing the data via OSF

The data is stored on OSF. It includes the following components:

Annotated stimuli: Text materials with grammaticality annotations, corresponding Areas of Interest (AOI) files, and associated textual features.
Eye-tracking data: Provided in multiple formats, including raw .edf files, ASCII exports, gaze events (e.g., fixations and saccades) before and after vertical drift correction, and reading measures at both the word and sentence levels.
Data quality reports: Documentation reporting calibration and validation scores, data loss ratios, blink rates, dwell time on stimuli.
Plots: Trial-level visualizations including main sequence plots, trace plots, and gaze event plots.
Behavioral data: Participants’ responses to comprehension questions and grammatical acceptability judgments.
Participant metadata: Demographic and sociolinguistic background information provided via questionnaires.

Accessing the data via pymovements

CoLAGaze is integrated into the pymovements package. And can be accessed with Python or R code.

Python code:

import pymovements as pm
# Specify the dataset name and the local data directory.
dataset = pm.Dataset(name ='CoLAGaze',path ='data/')
# Download the dataset
dataset.download ()

R code:

pm <- import ("pymovements")
# Specify the dataset name and the local data directory
dataset = pm$Dataset ('CoLAGaze', path ='data/')
# Download the dataset
dataset$download ()

Data processing

Stimuli data processing
- map character level AOIs to word-level AOIs: word_index_mapping.py
- compute word-level surprisal: compute_surprisal.py
- compute word_level textual features (length, lemma frequency) and sentence-level textual features: compute_lemma_frequency.py, compute Berzak's_scanpathreg_feat.py
Behavioral data processing
- check the accuracy of the comprehension questions responses: CQ_responses_accuracy_check.py
- check the accuracy of the grammatical acceptability judgments responses grammaticality_responses_accuracy_check.py
Gaze data processing
- denoise the data and extract gaze events (fixations and saccades): preprocessing.py
- correct vertical drift for some trials (manual correction): vertical drift correction.py
- compute word-level reading measures and sentence level reading measures:map_events_compute_features.py
- compute scanpath regularity based on What is the scanpath signature of syntactic reanalysis?: compute Berzak's_scanpathreg_feat.py
- compute features from Predicting Native Language from Gaze:compute Berzak's_scanpathreg_feat.py

Visualisation

create main sequence plots, trace plots, and gaze event plots: create_plots.py

Analysis

alalyse the difference between reading measures for grammatical and ungrammatical sentences, visualise: CoLAGaze analysis.R

Data quality reports

extract calibration and validation quality: calibration_validation_report.py
compute dwell time on stimuli and skipping rate for content words and function words: dwell_time_skip_rate_report.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CoLAGaze

Accessing the data via OSF

Accessing the data via pymovements

Data processing

Visualisation

Analysis

Data quality reports

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Analysis		Analysis
Behavioral data processing		Behavioral data processing
Data quality reports		Data quality reports
Gaze data processing		Gaze data processing
Stimuli data processing		Stimuli data processing
Visualisation		Visualisation
README.md		README.md

DiLi-Lab/colagaze-processing

Folders and files

Latest commit

History

Repository files navigation

CoLAGaze

Accessing the data via OSF

Accessing the data via pymovements

Data processing

Visualisation

Analysis

Data quality reports

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages