Skip to content

This repository contains the code for preprocessing the CoLAGaze text, eye-tracking and behavioral data and conducting the data analysis described in the paper CoLAGaze: A Corpus of Eye Movements for Linguistic Acceptability.

Notifications You must be signed in to change notification settings

DiLi-Lab/colagaze-processing

Repository files navigation

CoLAGaze

This repository contains the code for preprocessing the CoLAGaze text, eye-tracking and behavioral data and conducting the data analysis described in the paper CoLAGaze: A Corpus of Eye Movements for Linguistic Acceptability.

Accessing the data via OSF

The data is stored on OSF. It includes the following components:

  • Annotated stimuli: Text materials with grammaticality annotations, corresponding Areas of Interest (AOI) files, and associated textual features.

  • Eye-tracking data: Provided in multiple formats, including raw .edf files, ASCII exports, gaze events (e.g., fixations and saccades) before and after vertical drift correction, and reading measures at both the word and sentence levels.

  • Data quality reports: Documentation reporting calibration and validation scores, data loss ratios, blink rates, dwell time on stimuli.

  • Plots: Trial-level visualizations including main sequence plots, trace plots, and gaze event plots.

  • Behavioral data: Participants’ responses to comprehension questions and grammatical acceptability judgments.

  • Participant metadata: Demographic and sociolinguistic background information provided via questionnaires.

Accessing the data via pymovements

CoLAGaze is integrated into the pymovements package. And can be accessed with Python or R code.

Python code:

import pymovements as pm
# Specify the dataset name and the local data directory.
dataset = pm.Dataset(name ='CoLAGaze',path ='data/')
# Download the dataset
dataset.download ()

R code:

pm <- import ("pymovements")
# Specify the dataset name and the local data directory
dataset = pm$Dataset ('CoLAGaze', path ='data/')
# Download the dataset
dataset$download ()

Data processing

  • Stimuli data processing
    • map character level AOIs to word-level AOIs: word_index_mapping.py
    • compute word-level surprisal: compute_surprisal.py
    • compute word_level textual features (length, lemma frequency) and sentence-level textual features: compute_lemma_frequency.py, compute Berzak's_scanpathreg_feat.py
  • Behavioral data processing
    • check the accuracy of the comprehension questions responses: CQ_responses_accuracy_check.py
    • check the accuracy of the grammatical acceptability judgments responses grammaticality_responses_accuracy_check.py
  • Gaze data processing
    • denoise the data and extract gaze events (fixations and saccades): preprocessing.py
    • correct vertical drift for some trials (manual correction): vertical drift correction.py
    • compute word-level reading measures and sentence level reading measures:map_events_compute_features.py
    • compute scanpath regularity based on What is the scanpath signature of syntactic reanalysis?: compute Berzak's_scanpathreg_feat.py
    • compute features from Predicting Native Language from Gaze:compute Berzak's_scanpathreg_feat.py

Visualisation

  • create main sequence plots, trace plots, and gaze event plots: create_plots.py

Analysis

  • alalyse the difference between reading measures for grammatical and ungrammatical sentences, visualise: CoLAGaze analysis.R

Data quality reports

  • extract calibration and validation quality: calibration_validation_report.py
  • compute dwell time on stimuli and skipping rate for content words and function words: dwell_time_skip_rate_report.py

About

This repository contains the code for preprocessing the CoLAGaze text, eye-tracking and behavioral data and conducting the data analysis described in the paper CoLAGaze: A Corpus of Eye Movements for Linguistic Acceptability.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published