Datasets in the IR-Group

Datasets on `linux2`

All datasets are located on /datasets a volume exclusively for datasets like IR test collections, document corpora or other forms of data that is used in our research.

Dataset	Creator	Year	Size	Type	Usecase
AOL	G. Pass, A. Chowdhury, C. Torgeson	2006	2,1G (zipped)	IR test collection	personalization, query reformulation or other types of search research
semanticscholar	Waleed Ammar	2019	46G (zipped)	document corpora	ad-hoc retrieval
iSearch	Aalborg University	2010	50G (zipped)	IR test collection	Integreated search and citation-based retrieval
Washington Post	NIST	2018	1.5G (zipped)	IR test collection	ad-hoc retrieval
Washington Post (v4)	NIST	2021	2.4G (zipped)	IR test collection	ad-hoc retrieval
Tipster 1/2/3	NIST	1994	1.3G (zipped)	IR test collection	ad-hoc retrieval
TREC Disks 4/5	NIST	1997	820MB (zipped)	document corpora	ad-hoc retrieval
New York Times	Evan Sandhaus	2008	1G (zipped)	document corpora	ad-hoc retrieval
AQUAINT	David Graff	2002	3G (zipped)	document corpora	ad-hoc retrieval
GIRT4	GESIS-IZ	2006	110M (zipped)	IT test collection	ad-hoc retrieval, domain-specific, multilingual
TripClick	Navid Rekab-saz, Oleg Lesota, Markus Schedl, Jon Brassey, Carsten Eickhoff	2021	32.7G (zipped)	Click log dataset	ad-hoc retrieval, deep learning models
Yahoo-L18	Yahoo! Research	2009/10	1.3G (zipped)	Click log dataset	ad-hoc retrieval, session analysis
Yandex - Personalized Web Search Challenge	Eugene Kharitonov, Pavel Serdyukov	2014	5.9G (zipped)	Click log dataset	ad-hoc retrieval, session analysis
TREC-OpenSearch	TREC OpenSearch Organizers	2016/17	600M (zipped)	Click log dataset	ad-hoc retrieval, session analysis

Adding new Datasets

Login on linux2.
Create a new folder for the dataset and copy the README.template.md in the new folder. Rename the file to README.md
Describe the data set along the template.
Copy all files for the dataset to the folder and add all binary files and folder to .gitignore.
Commit the README.md and all the additional files you would like to see on GitHub.
Update this page to include a brief description of the dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
AOL		AOL
AQUAINT		AQUAINT
GermaNet		GermaNet
NYT		NYT
NewsGuard		NewsGuard
TREC-OpenSearch		TREC-OpenSearch
TripClick		TripClick
WAPost		WAPost
WAPostv4		WAPostv4
Yahoo-L18		Yahoo-L18
Yandex-PersonalizedWebSearchChallenge		Yandex-PersonalizedWebSearchChallenge
curricula		curricula
dblp_monitoring		dblp_monitoring
girt4		girt4
iSearch		iSearch
semanticscholar		semanticscholar
tipster		tipster
trec-disks		trec-disks
wse_suggest		wse_suggest
.gitignore		.gitignore
README.md		README.md
README.template.md		README.template.md
technical-documentation.md		technical-documentation.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Datasets in the IR-Group

Datasets on `linux2`

Adding new Datasets

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

irgroup/datasets

Folders and files

Latest commit

History

Repository files navigation

Datasets in the IR-Group

Datasets on linux2

Adding new Datasets

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Datasets on `linux2`

Packages