Google Research Datasets

egotempo Public

google-research-datasets/egotempo’s past year of commit activity

Jupyter Notebook 11 CC-BY-4.0 0 2 0 Updated Apr 26, 2025
artydiqa Public
ArTyDi-QA is a dataset for Question Answering (QA) and Question Generation (QG) in Modern Standard Arabic (MSA), adapted from TyDiQA. It features extractive QA where models find answer spans or identify unanswerable questions, and a QG task involving formulating questions from context and answer pairs.

google-research-datasets/artydiqa’s past year of commit activity

0 0 0 0 Updated Apr 23, 2025
Amplify_SSA Public
An annotated dataset of 8,091 adversarial queries in seven Sub-Saharan African languages.

google-research-datasets/Amplify_SSA’s past year of commit activity

Jupyter Notebook 0 0 0 0 Updated Apr 18, 2025
web-images Public
Images gathered from the Internet in 2023 and some metadata

google-research-datasets/web-images’s past year of commit activity

HTML 2 2 0 0 Updated Mar 19, 2025
screen_qa Public
ScreenQA dataset was introduced in the "ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots" paper. It contains ~86K question-answer pairs collected by human annotators for ~35K screenshots from Rico. It should be used to train and evaluate models capable of screen content understanding via question answering.

google-research-datasets/screen_qa’s past year of commit activity

Python 114 CC-BY-4.0 8 3 0 Updated Feb 7, 2025
adversarial-nibbler Public
This dataset contains results from all rounds of Adversarial Nibbler. This data includes adversarial prompts fed into public generative text2image models and validations for unsafe images. There will be two sets of data: all prompts submitted and all prompts attempted (sent to t2i models but not submitted as unsafe).

google-research-datasets/adversarial-nibbler’s past year of commit activity

21 CC-BY-4.0 3 0 0 Updated Feb 3, 2025
cube Public
CUBE is a benchmark to evaluate the Cultural Competence of T2I models

google-research-datasets/cube’s past year of commit activity

8 CC-BY-4.0 0 3 0 Updated Jan 20, 2025
global_streamflow_model_paper Public

google-research-datasets/global_streamflow_model_paper’s past year of commit activity

Jupyter Notebook 57 Apache-2.0 16 3 0 Updated Jan 17, 2025
hiertext Public
The HierText dataset contains ~12k images from the Open Images dataset v6 with large amount of text entities. We provide word, line and paragraph level annotations.

google-research-datasets/hiertext’s past year of commit activity

Jupyter Notebook 283 CC-BY-SA-4.0 25 0 1 Updated Dec 2, 2024
scin Public
The SCIN dataset contains 10,000+ images of dermatology conditions, crowdsourced with informed consent from US internet users. Contributions include self-reported demographic and symptom information and dermatologist labels. The dataset also contains estimated Fitzpatrick skin type and Monk Skin Tone.

google-research-datasets/scin’s past year of commit activity

Jupyter Notebook 108 10 2 0 Updated Nov 23, 2024

View all repositories

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Google Research Datasets

Pinned Loading

Repositories

People

Top languages

Most used topics